Random Thoughts: Manic Episodes & Venture Capital Industry Rabbit Hole

When I was a second to third-year political science student, I got this unexpected and funky idea. I forgot how I ended up there exactly, but I do remember, however, I was extremely interested in political communication and campaigning at that time – I would become a film director sometime in the future. I would study film and dramatic arts when I was done with political science studies; was a night owl then, loved learning when everyone was sleeping, at peace, no phone calls or any other distractions. Stillness, from which you can advance to truly magnificent zones of extreme focus. 

So, what I did: I prepared myself a list of 60 directors to explore, a random number, everything from Kubrick to Gaspar Noe and Michale Haneke. Downloaded Blu-Ray versions of everything I could find, student works included. Went in methodologically, like a surgeon. I had this notebook in which I would keep track of all the small tricks directors used to design narratives and generate magical and transformative experiences. A few random favorite gems: Gaspar Noe’s “I stand alone” and Haneke’s “The Seventh Continent”. If you want to experience real violence on the screen, the one that shakes you existentially, not in a primitive and cheap way, trust me, check those too.

I did not become a film director, and I probably won’t. A few years later, I was trying to assess the practical value of those long winter nights, a self-imposed movie hype that lasted for almost five months. Maybe I gained a unique communication skill? Or was I sharpening my analytics skill? Would it be valuable in my later life, and how? I enjoyed it though, which should be enough I guess.

The latest manic attack that I am experiencing is the venture capital industry. How did I get there?

I probably started asking myself about the financial roots and origins of what’s happening around. All this crazy and experimental stuff that is being built, someone needs to fund it. How does it work? All this impact costs – who funds it, why, and how? It’s like a background, a backstage of reality we encounter every day. I was interested, so I started thinking and learning.  

It’s also that I want to build. I need to know the funding logic and procedures because I am still far away from serious bootstrapping capabilities.

It’s also so cool: imagine, you are in your late fifties or sixties, and young brilliant folks are pitching to you? It’s like a fountain of youth. You are supporting, financially and otherwise, the creations of something wow.

Something clicked, and I was hooked.

In almost a year-long journey, I have collected more than 1100 learning resources so far. Everything is there: 200 youtube/podcast videos, books, scientific papers, long reads, high-quality articles, encyclopedia-type writings, interviews etc. Legal, finance, entrepreneurship, and so on. I’ve tried to cover all the possible aspects of the industry: its roots and origins, prehistory, history, the most significant and illustrative case studies, mechanics, math, trends, uses of different technologies in the process. All that I could think of and all that I’ve stumbled upon. It’s incredible how much you can learn from Youtube videos if you listen carefully. Casual conversations are the best because friendly talks go places formal formats don’t. I watched almost all “This week in startups” videos. JCal, you’re a legend!

By the way, one of the things that I’ve figured out along the way: love reading the most because it’s the fastest knowledge acquisition technique, at least in my case.

What would be interesting to share?

Let’s start before it has actually started, with true silicon pioneers. Rober Noyce, Intel founder, and his fella are starting a company. At that time, the money floats from the East to the West.  Noyce and his partner create a short document, a few pages, with so many typos and incorporate it soon after. That’s how it was done in the earliest days. Please watch this great documentary. You won’t regret it if you’re even remotely interested in any of this. 

One of the naughtiest, Sean Parker. This is a great read. Peter Thiel now runs one of the most prestigious VC companies. He was the first Facebook investor, which made him a fortune and allowed him to continue with venturing endeavors. Sean was the one to introduce, first seeing a behemoth Facebook will become. Many claim that he saw it even before Zuckerberg himself.

Facebook fundraising is very interesting, from a historical perspective. Sean had a clinch with Sequia, probably the number one all-time VC firm. So Mark did a prank showing late at the meeting and dressed funny.  This surreal episode, claims JCal in this great interview, had a great impact. It will become much more about founders, hence the “Founders Fund”. A structural shift in founders <> VCs governance relations. It was so common for investors to just throw founders under the bus prior to that change. Zuckerberg’s position, thanks to Sean, was so strong that even today he rules the company, with his power not diluting, uncommon in the Valley.

Speaking of Sequoia, after legendary Don Valentine, two immigrants were put in charge. What’s interesting is that one is a journalist from Wales and the second one is a salesperson, an Italian immigrant. No finance people nor even tech people, and a journalist as one of the most successful VCs in history. I would recommend these two conversations: Michel Moritz and Doug Leone.

Unlike Sequoia, Kleiner Perkins, who was top of the class, losing its position. How does that happen? John Doerr is still the wealthiest VC I think. While still prestigious, Kleiner is not what it used to be.

I also liked this crazy Bulgarian guy, Delian Asparouhov, who wears sneakers of different colors. You should watch this interview, or this one. I love the people whose minds just don’t stop. Interesting reasoning. Like his space passion, and how, contrary to a classical founder route (build (decades) > cash (found/invest), as a VC he can contribute to space explorations immediatly. Many interesting points about space in both interviews. Strongly recommend them. It’s incredible how practical space is, even for what’s happening here, on planet Earth. 

And then, the “Tiger phenomenon”, magnificently explained in this article. In short, private markets, and Tiger Global just pumping huge amounts of money in, in record times, say weeks, doing deals almost every other day, without requesting any control in return. Meaning, public investors moving to private markets, hot markets, and so many other things. Consequences? So many. Read the above article. And crazy, aggressive  Masa Son (SoftBank/Vision Fund) playing with hundreds of billions.

I must admit that I am enjoying this journey so far!

What shall I do with all of this?

Daily Random Thoughts #8 – Relationships #1: “Ethical Relationships” (13/11/20)

I think a lot about something I call “ethical relationships” (the concept probably already exists?) recently.

What are “ethical relationships”? Let’s try to play with the concept.

1) Benefit of the Other

They are sum-sum games where both sides sincerely care about it each other. They both, whether lovers or friends, proactively do things that they think will benefit the Other. If one side feels better, the other feels laughter. Wins are shared.

2) Growth

Both sides grow as a consequence of the relationship. They become better.

Those relationships are brave, on an intimate level, in the sense that they step outside usual boundaries, which makes both parties stronger.

3) Transparency

I am fascinated with transparency as a principle. In the case of human relationships, it would mean that honesty has the highest value and that intentions are communicated clearly. Feelings are not hidden but shared. It makes everything (more) flawless. And faster. 

4) Good for the World

Besides being beneficial for both actors, they are also great for the World. People who appreciate and trust each other are good for their human environment. Energy spreads.

Daily Random Thoughts #7: Why Everyone Should Start an NGO as Early as Possible (11/11/20)

You will learn a ton and earn things money can’t buy.

What is an NGO? An organization working for a specific cause. Fighting for the cause is always a good thing. Learning about organizing (and people in general) is one of the most rewarding experiences because organizations are everywhere, and everything is an organization.  

This is a (unfinished!) list of the things you will learn: legal and taxes; fundraising; management; project management; human resources; strategic communications; marketing; accounting; public relations; public speaking; networking; leadership; sales; community organizing.

Sounds like a mini MBA? 

How to do it? List things you care about. May it be pets, astronomy (was my case in high school!), sport, startups, elders… Refugees, digital government, peace, or clean water. So many things that aren’t right and can be improved! Find those who care about the same thing(s). You’re not alone. You’re never alone. And just start. Learn along the way.

You’ll meet new friends, the most meaningful kind: those who share your values. From a strict investment perspective, you will benefit greatly. Up to a few times bigger income, if you decide to shift to business years later. Skills and relationships are probably the two most important things for any professional.

It’s important to found/co-found because then you need to do everything.  Also, the feeling when you realize that you have actually built something that matters from scratch. Phenomenal!

It’s a social capital game; personal gains are truly exponential, but it’s also great for society. People, together, becoming relevant and influencing society in positive ways.

I think I will write an in-depth essay on this!

Daily Random Thoughts #6: Changing – And Even Inventing – The Past (13/10/20)

I was so fascinated with the possibility of changing the Past that I wanted to do an MA on the subject once, 5 years ago. My angle would be propaganda focused on changing the past, and how operations like that influence behavior and the future.

Past should be – done? Fixed? Wrong. On both “historical” and personal level, the Past can change.

On a personal level.

I read “The Schopenhauer Cure” by Irvin D. Yalom in a day. It was my personal reading record back then, 400 pages in a day. At some point within a story, a very powerful Nietzsche’s thought is introduced.

“To change ‘it was’ into ‘thus I willed it’—that alone shall I call redemption.”

You are probably familiar with the genius music video for Massive Attack’s “Angel”?

It’s something like a revolution when things change like that. Past changes, as it’s only and always an interpretation, so the future becomes different. That past-future relationship is very interesting!

On a historical level.

I remember one particular lecture on the history of Serbian political thought while attending political science classes in Belgrade. Professor was naming early 19th-century Serbian philanthropists and explaining their contributions, a lot of them. After a while, he curiously asked us: “Do you feel better now? Did you know that these men were your fellows from the past?”. It was intriguing. We did not know any of them. And we did feel better.

How we understand ourselves, collectively, influence not just how we feel, but also how we will act in the future. That’s for example one of the functions of myths, right?

On a philosophical level.

Past is always “in relation to us”? Hence it’s always facts plus interpretation, and that interpretation is what matters the most. It opens so many questions about truth, but that’s a whole other ground.

Daily Random Thoughts #5: Abraham Lincoln And The Rabbit Hole (12/10/20)

Did you know that you can be a lawyer without even going to law school? Abraham was one of them. And did you know that you get a PhD even without a BA? And not just by founding a future trillion-dollar company, like Zuck of Bill.

That night I was interested in Abraham Lincoln, one of the most cherished US presidents. I love getting to know interesting, influential historical people, deeply. Abraham was such a masterful politician. He hacked his way to the Presidency and did so many splendid things. I watched this documentary as a first introduction and I will definitely dig more on him in the future. It was an inspiring night, I fell into a deep rabbit hall.

I did not know that Republicans were the ones against slavery and that Democrats were largely OK with it. A short video on the white supremacy legacy of the Democratic party. A short video on how the Republican party went from Lincoln to Trump. It’s interesting how things radically change over time. Lincoln was huge.

Then I was interested in Abraham’s (formal) education. You should read this incredible article about how to become a lawyer without going to law school. Lincoln was self-thought. So were many others. When you try to catch the bigger picture, you find so many great self taught individuals, and realize that it’s actually a very strong tradition, especially in the US.

In that regard, it’s interesting to think about what Peter Thiel is doing with his Thiel Fellowship. For those not familiar, it’s $100,000 to drop out, even from high-school. It does make sense: “Build new things instead of sitting in a classroom.” It’s at least 10x learning. Yet again, Peter Thiel, with whom I would not agree on so many points, claiming that higher education is an insurance policy, a bad one, actually. Google it, few very interesting hypotheses. 

Then I ended up reading about PhDs without MA – possible in the US –  and found out that you can even get a Doctorate of Philosophy without finishing undergraduate studies. Great ones are always an exemption, Ludwing Wittengetsin comes to my mind as first, but it’s possible even for “mortals”. For example, you can start digging here. It’s kind of obvious though, PhD should just be your authentic scientific contribution to the world; Why would anyone care if you paid a lot of cash and spent 3 years preparing for it? Have a great scientific contribution? And 3 professors and an institution to assess it, a procedure? Great: Thanks for your contribution, you’re PhD.

Then I stumbled upon an example of former MIT Media Lab Director, Joi Ito, who later resigned because of the tie with Jeffrey Epstein. He led one of the most interesting academic institutions in the world, well, at least without any significant formal academic credentials. Interesting bio. 

I love rabbit holes. You can get inspired and learn so much.

Daily Random Thoughts #4: What If You Could Clone And Multiply Yourself? (11/10/2020)

So, we’re in Croatia, on the dance floor. Hunee is playing trancy ambiental and obscure disco music. It’s lovely. I have this nasty habit of thinking while dancing. A close friend even said to me once: “People go out to get drunk and make a mess; you go out to dance and think!”

The music was still catching heights when I got lost in thinking about how wonderful it would be to clone yourself. Selling your time for cash, that’s frustratingly limited. What if there was a way to multiply it, several, even unlimited times, all in parallel?

Think about it. If all your work generates a digital output (design / code / writing / even voice), it wouldn’t matter if it was done by a machine, computer, instead of you. Let’s just say it’s on your behalf. If what’s truly you, skill-wise, and what you can do, could be learned and easily reproduced in novel contexts, that would be such a game-changer.

Ok, Something, an advanced AI system, would need to scan a) all of your past works and b) your current processes to be able to produce output as yours. For the past output, if it’s already digital – there’s enough data for the algorithm to be trained. Your style can be mastered. For the second one, would it be possible without brain-computer interfaces? Can your unique skills, as in the unique process behind them, be learned by a machine?

Then I got lost fantasizing how it would be so rad if it was possible. If possible:

It would be such empowerment for all those selling their digital time: Instead of, say, a few dollars an hour, it could be much more. More equality, and opportunities, as a consequence.

From the perspective of the whole economy: What a productivity boost that would be! All the great and necessary things could infinitely accelerate.

Elon Musk is a big-league psycho, in a good, even magnificent way. Neuralink – this is obviously an afterthought because Neuralink wasn’t born when that dance happened – might be the part of the solution here. Or something similar. General AI technologies like GPT-3 or some future GPT-9 might be part of the alchemic equation.

I concluded that it will be possible in the future in one way or another and that it will be mega exciting, and continued to dance.

Daily Random Thoughts #3: Superskills and Sales In Particular (10/10/2020)

My dear friend Lav surprised me with “The Last Safe Investment”, a book on contemporary life, a few years ago. One of the most interesting things in it is the exploration of so-called superskills. Those are skills that are universally applicable and can bring exponential returns. One of them is particularly interesting: sales.

It’s one of the most rewarding ones. Everyone should learn it. It’s a colossal investment.

First of all, it’s quite universal. The world will always need salespeople because sales is what makes the economy work. Any industry needs it. Any business needs it. Market cries for skilled salespeople. You can practice in your streets, selling candies. I am not kidding 🙂 Great salespeople can become really wealthy. If that’s your goal.

You don’t have to be a salesperson, it doesn’t have to be your career choice. It can be stressful, and it doesn’t fit the character of many. Sales skill is probably the greatest multiplier of all the skills. No matter what you are good at and what your passion is, add a sales superskill to it and everything will be multiplied. Academic? Publish your Ph.D. work in top media publications and that’s a few thousand $$ more. Programmer? How about earning 3 times more? It’s sales. Student? Get that prestige scholarship and paid internship. Entrepreneurship is mostly about sales. Sales get you better hires and media hype. 

What is sales? Action that leads to the desired transaction. And, if you think about it more deeply, everything is a transaction. Date,  your salary, any pitch, no matter how naive or big.

For more on superskills you should read the book. I would definitely recommend it.

Daily Random Thoughts #2: Luck is More Accessible Than Ever (09/10/20)

Some people are in the right place at the right time, and just that, sometimes even literally, makes them millionaires. Some are lucky in a sense that they never experience poverty or anything close to it, or because they’re born athletic, with an IQ of 180, or because they were loved by both parents, or that they had parents around them at all. Just a random thing like country of birth matters a hell lot. 

I don’t believe that what we are given as a starting point can’t be totally, and I mean totally, changed. Even luck-wise. Nietzsche was my favorite philosopher for a while, I did graduate work on his philosophy in high school. I deeply believe in so-called “voluntarism”, Will being the mightiest force in the universe. If we can decide to be luckier, how do we do it?

I believe that the keyword is “network”. And exactly that’s why I think we live in an age when luck is more accessible than ever before. 

What is luck after all? A random, largely unexpected major win? If you’re exposed more, it can hit you more often. It’s a probability thing. If it’s a probability thing than the Internet and all of its products are the best thing ever. And if it’s mostly about networks, then our conscious, decisive plays can make a significant change. It can become some sort of a personal revolution.

The Universe, World, God, Life, or whatever you believe in, can be atrocious to you. Still, I do believe, you can, proactively and directly influence the level of your Luck. You first need to be (self)conscious about it. 

All the opportunities (luck being one of them) float through networks, as information. That fabulous, life-changing opportunity, someone serving you a 10x or 100x wizardry in a second, it’s there, somewhere within the network. Those networks are primarily made of people. People talk to each other, people share, people love and enjoy to give. Many get lucky along the way, without a conscious intention. So many Lucks occur as consequences of random interactions.

Let’s focus on the “right person in the right place at the right time” for a bit. Ok, it’s about location, and that’s highly relative, in a sense that borders often act as the evil cages. Then migrate. As more places where things actually happen move online (FB groups, Discord or Slack channels, forums, email lists and so on), it’s easier than ever before to be where the magic happens. Luck strikes in magical places. You can find them.

Why are some people luckier? The first reason is that they are there. The second, obvious one, is that they have access to people other mortals don’t. Think of alumni networks, rich parents, friends of parents, homies from the hood, kindergarten bros and sis. Hence the keyword is “network”. You can’t compete with that. You probably won’t be the luckiest person in the world. But you can, I am 100% sure of it, decide to be luckier, and act upon it.

What can one for example do to become luckier?

Learn to network. Do it all the time.

I believe this is the most important. It’s all about relationships. That is what makes some privileged, right? The truth is: anyone can do it. Invest in people. Find your tribe. The world is composed of tribes. Find yours. And then, on some random day, someone will casually mention something that will change your life forever and you’ll be lucky.

Even highly introverted people are not that shy online. You don’t have to speak in front of the 500 people you don’t know. You can use hashtags on twitter. Or Instagram. Or LinkedIn. Whatever one’s passions and interests are, there’s probably a very high probability that there are hundreds of others around the planet who share them. That fit is natural and it’s really powerful. Welcome to your network!

Be more open.

Some people see things others don’t and act when they smell Luck. They’re luckier because they’re more open. Be aware of your environment and the world around you. Listen. Watch. Many people don’t see at all and it’s such a shame. Say YES and “why not?” more often. Anyone can become more open. It’s super easy. Drives Luck as well.

Content.

I love the content on the Internet. It’s the most egalitarian thing in the world: it doesn’t matter if you have a Ferrari in your garage or Prada shoes on your feet. It’s cheap to produce. The magnificent ways content can lead to Luck are nuts. One morning you can receive a life-changing message because someone has stumbled upon you accidentally and, oh, you’re lucky!

Learn to be more proactive.

I guess it takes a while, but it’s achievable. It’s a prerequisite for so many great things in life. The gym is one of the little hacks. There are so many of them.

Seriously, what are you doing today to be luckier tomorrow? The world with luckier people would be a much happier place to live in. Everyone should try to learn to be luckier.

Daily Random Thoughts #1: What If It’s Just Nonsense? (08/10/2020)

“Conspiracy theories” are not always just paranoia and overdramatizing. People do try to conspire when then can. The history of secret diplomacy would be one instance, where countries agree on borders and other important stuff behind the back of the rest of the world. Or when business leaders illegally draft secret masterplans over coffee in a random hotel bar. It’s natural, and rational, sometimes even legitimate.

What’s much more interesting though, on the opposite side of the spectrum, is where the reason for X is pure stupidity. Quite often, I assume, unexpected personal reasons demolish countries, too. Sometimes it’s just nonsense.

One of my favorite movies is “I Stand Alone”, written and directed by Gaspar Noé. In short: a series of misunderstandings ruins a guy’s life. It kept me wondering a lot.

A few days ago I heard a very interesting story about one surreal almost existential risk episode. Some civilian rebels in Pakistan, I was told, captured a building without knowing that the nuke was inside. Shit happens for no particular reason.

It might not be anybody’s intention, a plan, but rather “just nonsense”.

Wikidata and the Next Generation of the Web – Goran Milovanović, Data Scientist for Wikidata @Wikimedia Deutschland

The world without Wikipedia would be a much sadder place. Everyone knows about it and everyone uses it – daily. The dreamlike promise of a great internet encyclopedia, accessible to anyone, anywhere, all the time – for free, has become a reality. Wikidata is a lesser-known part of the Wikimedia family, represents a data backend system of all Wikimedia projects, and fuels Apple’s Siri,  Google Assistant, and Amazon’s Alexa among many other popular and widely-used applications and systems.


Wikipedia is one of the most popular websites in the world. It represents everything glorious about the open web, where people share knowledge freely, generating exponential benefits for humanity. Its economic impact can’t be calculated; being used by hundreds of millions, if not billions of people worldwide, it fuels everything from the work of academics to business development.

Wikipedia is far more than just a free encyclopedia we all love. It’s part of the Wikimedia family, which is, in their own words: “a global movement whose mission is to bring free educational content to the world.” To summarize its vision: “Imagine a world in which every single human being can freely share in the sum of all knowledge.”

Not that many people know enough about Wikidata, which acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.


Goran Milovanović is one of the most knowledgeable people I know. I invited him to lecture about Data Science potential in the public administration at the policy conference that I organized five years ago. We remained friends and I enjoy talking with him about everything that has ever popped in the back of my head. Interested in early Internet development and the role of RAND in immediate postwar America? No worries, he’ll speak 15 minutes about it in one breath.

Goran earned a Ph.D. in Psychology (a 500+ pages long one, on the topic of Rationality in Cognitive Psychology) from the University of Belgrade in 2013, following two years as a graduate student in Cognition and Perception at NYU, United States. He spent a lot of years doing research in Online Behaviour, Information Society Development, and Internet Governance, co-authoring 5 books about the internet in Serbia. He provides consultancy services in Data Science for Wikidata to Wikimedia Deutschland since 2017, where he is responsible for the full-stack development and maintenance of several analytical systems and reports on this complex knowledge base, and runs his own Data Science boutique consultancy DataKolektiv from Belgrade.

He’s a perfect mixture of humanities, math, and engineering, constantly contemplating the world from a unique perspective that takes so many different angles into account. 

We focused our chat around Wikidata and the future of the web, but, as always, touched many different phenomena and trends.


Before we jump to Wikidata: What is Computational Cognitive Psychology and why were you so fascinated with it?

Computational Cognitive Psychology is a theoretical approach to the study of the mind. We assume that human cognitive processes – processes involved in the generation of knowledge: perception, reasoning, judgment, decision making, language, etc. – can essentially be viewed as computational processes, algorithms running not in silico but on a biologically evolved, physiological hardware of our brains. My journey into this field began when I entered the Department of Psychology in Belgrade, in 1993, following more than ten years of computer programming since the 80s and a short stay at the Faculty of Mathematics. In the beginning, I was fascinated by the power of the very idea, by the potential that I saw in the possible crossover of computer science and psychology. Nowadays, I do not think that all human cognitive processes are computational, and the research program of Computational Cognitive Psychology has a different meaning for me. I would like to see all of its potential fully explored, to know the limits of the approach, and then try to understand, do describe somehow, even intuitively only, what was left unexplained. The residuum of that explanatory process might represent the most interesting, significant aspect of being human at all. The part that remains irreducible to the most general scientific concept that we have ever discovered, the concept of computation, that part I believe to be very important. That would tell us something about the direction that the next scientific revolution, the next paradigm change, needs to take. For me, it is a question of philosophical anthropology: what is to be human? – only driven by an exact methodology. If we ever invent true, general AI in the process, we should treat it as a by-product, as much as the invention of the computer was a by-product of Turing’s readiness to challenge some of the most important questions in the philosophy of mathematics in his work on computable numbers. For me, Computational Cognitive Psychology, and Cognitive Science in general, do not pose a goal in themselves: they are tools to help us learn something of a higher value than how to produce technology and mimic human thinking.

What is Wikidata? How does it work? What’s the vision?

Wikidata is an open knowledge base, initially developed by parsing the already existing structured data from Wikipedia, then improved by community edits and massive imports of structured data from other databases. It is now the fastest-growing Wikimedia project, recently surpassing one billion edits. It represents knowledge as a graph in which nodes stand for items and values and links between them for properties. Such knowledge representations are RDF compliant, where RDF stands for Resource Description Framework, a W3C standard for structured data. All knowledge in systems like Wikidata takes a form of a collection of triples, or basic sentences that describe knowledge about things – anything, indeed – at the “atomic” level of granularity. For example, “Tim Berners-Lee is a human” in Wikidata translates to a sentence in which Q80 (the Wikidata identifier for “Tim Berners Lee”) is P31 (the Wikidata identifier for the “instance of” property of things) of Q5 (the Wikidata identifier for a class of items that are “humans”). So, Q80 – P31 – Q5 is one semantic triple that codifies some knowledge on Sir Timothy John Berners-Lee, who is the creator of the World Wide Web by the invention of the Hypertext Transfer Protocol (HTTP) and 2016. recipient of the Turing Award. All such additional facts about literally anything can be codified as semantic triples and composed to describe complex knowledge structures: in Wikidata, HTTP is Q8777, WWW is Q466, discoverer or inventor is P61, etc. All triples take the same, simple form: Subject-Predicate-Object. The RDF standard defines, in a rather abstract way, the syntax, the grammar, the set of rules that any such description of knowledge must follow in order to ensure that it will always be possible to exchange knowledge in an unambiguous way, irrespectively of whether the exchange takes place between people or computers.

Wikidata began as a project to support structured data for Wikipedia and other Wikimedia projects, and today represents the data backbone of the whole Wikimedia system. Thanks to Wikidata, many repetitions that might have occurred in Wikipedia and other places are now redundant and represent knowledge that can be served to our readers from a central repository. However, the significance of Wikidata goes way beyond what it means for Wikipedia and its sisters. The younger sister now represents knowledge on almost one hundred million things – called items in Wikidata – and grows. Many APIs on the internet rely on it. Contemporary, popular AI systems like virtual assistants (Google Assistant, Siri, Amazon Alexa) make use of it. Just take a look at the number of research papers published on Wikidata, or using its data to address fundamental questions in AI. By means of the so-called external identifiers – references from our items to their representations in other databases – it represents a powerful structured data hub. I believe Wikidata nowadays has the full potential to evolve into a central node in the network of knowledge repositories online.

Wikidata External Identifiers: a network of Wikidata external identifiers based on their overlap across tens of millions of items in Wikidata, produced by Goran S. Milovanovic, Data Scientist for Wikidata @WMDE, and presented at the WikidataCon 2019, Berlin

What’s your role in this mega system? 

I take care about the development and maintenance of analytical systems that serve us to understand how Wikidata is used in Wikimedia websites, what is the structure of Wikidata usage, how do human editors and bots approach editing Wikidata, how does the use of different languages in Wikidata develop, whether it exhibits any systematic biases that we might wish to correct for, what is the structure of the linkage of other online knowledge systems connected with Wikidata by means of external identifiers, how many pageviews we receive across the Wikidata entities, and many more. I am also developing a system that tracks the Wikidata edits in real-time and informs our community if there are any online news relevant for the items that are currently undergoing many revisions. It is a type of position which is known as a generalist in the Data Science lingo; in order to be able to do all these things for Wikidata I need to stretch myself quite a bit across different technologies, models and algorithms, and be able to keep them all working together and consistently in a non-trivial technological infrastructure. It is also a full-stack Data Science position where most of the time I implement the code in all development phases, from the back-end where data acquisition (the so-called ETL) takes place in Hadoop, Apache Spark, SPARQL, through machine learning where various, mostly unsupervised learning algorithms are used, towards the front-end development where we finally serve our results in interactive dashboards and reports, and finally production in virtualized environments. I am a passionate R developer and I tend to make use of the R programming language consistently across all the projects that I manage, however it ends up being pretty much a zoo in which R co-exists with Python, SQL, SPARQL, HiveQL, XML, JSON, and other interesting beings as well. It would be impossible for a single developer to take control of the whole process if there were no support from my colleagues in Wikimedia Deutschland and the Data Engineers from the Wikimedia Foundation’s Analytics Engineering team. My work on any new project feels like solving a puzzle; I face the “I don’t know how to do this” situation every now and then; I learn constantly and the challenge is so motivating that I truly suspect there can be many similarly interesting Data Science positions like this one. It is a very difficult position, but also one professionally very rewarding.       

If you were to explain Wikidata as a technological architecture, how would you do it in a few sentences? 

Strictly speaking, Wikidata is a dataset. Nothing more, nothing less: a collection of data represented so as to follow important standards that makes it interoperable, usable in any imaginable context where it makes sense to codify knowledge in an exact way. Then there is Wikibase, a powerful extension of the MediaWiki software that runs Wikipedia as well as many other websites. Wikibase is where Wikidata lives, and where it is served from wherever anything else – a Wikipedia page, for example – needs it. But Wikibase can run any other dataset that complies to the same standards as Wikidata, of course, and Wikidata can inhabit other systems as well. If by the technological architecture you mean the collection of data centers, software, and standards that make Wikidata join in wherever Wikipedia and other Wikimedia projects need it – well, I assure you that it is a huge and a rather complicated architecture underlying that infrastructure. If you imagine all possible uses of Wikidata, external to the Wikimedia universe, run in Wikibase or otherwise… then… it is the sum of all technology relying on one common architecture of knowledge representation, not of the technologies themselves.

How does Wikidata overcome different language constraints and barriers? It should be language-agnostic, right?

Wikidata is and is not language-agnostic at the same time. It would be best to say that it is aware of many different languages in parallel. At the very bottom of its toy box full of knowledge, we find abstract identifiers for things: Q identifiers for items, P identifiers for properties, L for lexemes, S for senses, F for forms… but those are just identifiers, and yes they are language agnostic. But things represented in Wikidata do not have only identifiers, but labels, aliases, and descriptions in many different languages too. Moreover, we have tons of such terms in Wikidata currently: take a look at my Wikidata Languages Landscape system for a study and an overview of the essential statistics.

What are the knowledge graphs and why they are important for the next generation of the web?

They are important for this generation of the web too. To put it in a nutshell: graphs allow us to represent knowledge in the most abstract and most general way. They are simply very suitable to describe things and relations between them in a way that is general, unambiguous, and in a form that can quickly evolve into new, different, alternative forms of representation that are necessary for computers to process it consistently. By following common standards in graph-based knowledge representation, like RDF, we can achieve at least two super important things. First, we can potentially relate all pieces of our knowledge, connect anything that we know at all so that we can develop automated reasoning across vast collections of information and potentially infer new, previously undiscovered knowledge from them. Second, interoperability: if we all follow the same standards of knowledge representation, and program our APIs that interact over the internet so to follow that standard, then anything online can easily enter any form of cooperation. All knowledge that can be codified in an exact way thus becomes exchangeable across the entirety of our information processing systems. It is a dream, a vision, and we find ourselves quite far away from it at the present moment, but a one rather worth of pursuit. Knowledge graphs just turn out to be the most suitable way of expressing knowledge in a way desirable to achieve these goals. I have mentioned semantic triples, sentences of the Subject-Predicate-Object form, that we use to represent the atomic pieces of knowledge in this paradigm. Well, knowledge graphs are just sets of connected constituents of such sentences. When you have one sentence, you have a miniature graph: a Subject points to the Object. Now imagine having millions, billions of sentences that can share some of the constituents, and a serious graph begins to emerge.

A part of the knowledge graph for English (Q1860 in Wikidata)

Where do you think the internet will go in the future? What’s Wikidata’s role in that transformation?

The future is, for many reasons, always a tricky question. Let’s give it a try: on the role of Wikidata, I think we have clarified that in my previous responses: it will begin to act as a central hub of Linked Open Data sooner or later. On the future of the Internet in general, talking from the perspective of the current discussion solely: I do not think that the semantic web standards like RDF will ever reach universal acceptance, and I do not think that is even necessary for that to happen to enter the stage of internet evolution where complex knowledge is almost seamlessly interacting all over the place. It is desirable, but not necessary in my opinion. Look at the de facto situation: instead of evolving towards one single, common standard of knowledge and data representation, we have a connected network of APIs and data providers exchanging information by following similar enough, easily learnable standards – enough to not make software engineers and data scientists cry. Access to knowledge and data will ease and governments and institutions will increasingly begin to share more open data, increasing the data quality along the way. It will become a thing of good manners and prestige to do so. Data openness and interoperability will become one of the most important development indicators, tightly coupled with questions of fundamental human rights and freedoms. To have your institution’s open data served via an API and offered in different serializations that comply with the formal standards will become as expected as publishing periodicals on your work is now. Finally, the market: more data, more ideas to play with.

A lot of your work is relying on technologies used in natural language processing (NLP), typically handling language at scale. What are your impressions of Open AI’s GPT-3 which is quite a buzz recently? 

It is fascinating, except for it works better in language production that can fool someone than in language production that exhibits anything like the traces of human-like thinking. Contemporary systems like GPT-3 make me think if the Turing test was ever a plausible test to detect intelligence in something – I always knew there was something I didn’t like about it. Take a look, for example, at what Gary Marcus and Ernest Davis did to GPT-3 recently: GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about. It is a clear example of a system that does everything to language except for it does not understand it. Its computational power and the level up to which it can mimic the superficial characteristics of the language spoken by a natural speaker are fascinating. But it suffers – and quite expectedly, I have to add – from a lack of understanding of the underlying narrative structure of the events, the processes it needs to describe, the complex interaction of language semantics and pragmatics that human speakers face no problems with. The contemporary models in NLP are all essentially based on an attempt to learn the structure of correlations between linguistic constituents of words, sentences, and documents, and that similarity-based approach has very well known limits. It was Noam Chomsky who in the late 50s – yes, 50s – tried to explain to the famous psychologist B. F. Skinner that just observing statistical data on the co-occurrences of various constituents of language will never provide for a representational structure powerful enough to represent and process natural language. Skinner didn’t seem to care back in time, and so didn’t the fans of the contemporary Deep Learning paradigm which is essentially doing exactly that, just in a way orders of magnitude more elaborated than anyone ever tried. I think we are beginning to face the limits of that approach with GPT-3 and similar systems. Personally, I am more worried about the possible misuse of such models to produce fake news and fool people into silly interpretations and decisions based on false, simulated information, than to question if GPT-3 will ever grow up to become a philosopher because it will certainly not. It can simulate language, but only the manifest characteristics of it; it is not a sense-generating machine. It does not think. For that, you need some strong symbolic, not connectionist representation, engaged in the control of associative processes. Associations and statistics alone will not do.

Do humanities have a future in the algorithmic world? How do you see the future of humanities in the fully data-driven world?

First, a question: is the world ever going to be fully data-driven, and what does that mean at all? Is a data-driven world a one in which all human activity is passivized and all our decisions transferred to algorithms? It is questionable if something like that is possible at all, and I think that we all already agree that it is certainly not desirable. While the contemporary developments in Data Science, Machine Learning, AI, and other related fields, are really fascinating, and while our society is becoming more and more dependent upon the products of such developments, we should not forget that we are light years away from anything comparable to true AI, sometimes termed AGI (Artificial General Intelligence). And I imagine only true AI would be powerful enough to run the place so that we can take a permanent vacation? But then comes the ethical question, one of immediate and essential importance, of would such systems, if they ever come to existence, be possible to judge human action and act upon the society in a way we as their makers would accept as moral? And only then comes the question of do we want something like that in the first place: wouldn’t it be a bit boring to have nothing to do and go meet your date because something smarter than us has invented a new congruence score and started matching people while following an experimental design for further evaluation and improvement?

Optimization is important, but it is not necessarily beautiful. Many interesting and nice things in our lives and societies are deeply related to the fact that there are real risks that we need to take into account because irreducible randomness is present in our environment. Would an AI system in full control prevent us from trying to conquer Mars because it is dangerous? Wait, discovering the Americas, the radioactive elements, and the flight to the Moon were dangerous too! I can imagine that humanity would begin to invent expertise in de-optimizing things and processes in our environments if some fully data-driven and AI-based world would ever come to existence. Any AI in a data-driven world that we imagine nowadays can be no more than our counselor, except for if the edge case that it develops true consciousness turns out to be a realistic scenario. If that happens, we would have to seriously approach the question of how to handle our relationship to AI and its power ethically.

I do not see how humanities could possibly be jeopardized in this world that is increasingly dependent on information technologies, data, and automation. To understand and discover new ways to interpret the sense generating narratives of our lives and societies was always a very deep, a very essential human need. Does industrialization, including our own Fourth age of data and automation, necessarily conflicts with the world aware of the Shakespearean tragedy, as it does in Huxley’s “Brave New World”? I don’t think it is necessarily so. I enjoy the dystopian discourse very much because I find that it offers so many opportunities to reflect upon our possible futures, but I do not see us living in a dystopian society anytime soon. It is just that somehow the poetics of the dystopian discourse are well-aligned, correlated with the current technological developments, but if there ever was a correlation from which we should not infer any fatalistic causation that is the one.