
Greg Osuri, founder of Akash Network, shares his groundbreaking approach to decentralized cloud computing and how it's disrupting hyperscalers like AWS, Google Cloud, and Microsoft Azure.
243 Audio.mp3: Audio automatically transcribed by Sonix
243 Audio.mp3: this mp3 audio file was automatically transcribed by Sonix with the best speech-to-text algorithms. This transcript may contain errors.
Greg: 0:00
There were two big challenges for AI training right. The one was data. Right, we didn't have data. There's a data limit as to how much you can get data to train. And second was energy what we saw with DeepSeq it can use synthetic data. So it's very, very amazing Using synthetic data, you can actually solve the data problem, but what we cannot solve is the energy problem.
Craig: 0:18
I think that's why it's very, very important if you're doing training, to focus on distributing your training runs versus trying to go with the traditional mechanism of centralizing your training runs because we're going to hit a cap in two years and we have no solutions. I'm past the point of looking for jobs, but I'm not past the point of looking for people to hire, and when I need to do that, I turn to Indeed. Imagine you just realized your business needed to hire someone yesterday. How can you find amazing candidates? Fast, easy, just use Indeed. When it comes to hiring, indeed is all you need. You can stop struggling to get your job posts seen on other job sites because Indeed's sponsored jobs help you stand out and hire fast. With sponsored jobs, your post jumps to the top of the page for your relevant candidates, so you can reach the people you want faster, and it makes a huge difference. According to Indeed data, sponsored jobs posted directly on Indeed have 45% more applications than non-sponsored jobs. Plus, with Indeed-sponsored jobs, there's no monthly subscriptions, no long-term contracts and you pay only for results.
Craig: 1:39
How fast is Indeed? In the minute I've been talking to you, fast as Indeed? In the minute I've been talking to you, 23 hires were made on Indeed, according to Indeed data worldwide. There's no need to wait any longer. Speed up your hiring right now with Indeed, and listeners of this show will get a $75 sponsored job credit To get your jobs more visibility at Indeed. To get your jobs more visibility, go to Indeedcom slash IonAI IonAI as always, all run together E-Y-E-O-N-A-I. That's Indeedcom slash IonAI. Right now, and support our show by saying you heard about Indeed on this podcast, indeedcom slash Eye on AI for a $75 sponsored job credit.
Greg: 2:36
Good to see you again, craig, great to be here. Thank you so much for having me. My name is Greg Osuri. I'm the CEO and founder for Overclock Labs, a core contributor to Akash Network Background. I've been a programmer all my life, been an open source developer for a little over 15 years and early days of my career I helped contribute to this container native ecosystem.
Greg: 3:02
A lot of my software I've written is still used by projects like Kubernetes, docker and whatnot, and we you know big users of the cloud and we noticed that cloud, as we know, is increasingly becoming more important in our daily lives, considering most of the workloads now are hosted on the cloud, and we felt, as it gets more prominent, it also is getting more closed. So we felt the need to have a transparent and open cloud, beginning with how the resources are priced, to how the resources are secured, distributed and whatnot, to how the resources are secure, distributed and whatnot, and in line with what we saw with Kubernetes and what we saw with Docker and Linux. Really, we wanted a similar cloud and that's really where our work on Akash began and it began as an open-source project. It still is a very, very mature open-source project at this point, but if you go back, the name Akash means in Sanskrit, the sky. The sky is where the clouds are formed. So that's where the name came from, and you know we were inspired by this super cloud concept by Cornell in 2015, where they prophesize that you can actually, if you decouple the resource layer that means the layer where the resources are acquired from the control layer, the control plane you can effectively create a mechanism where anyone can be a resource provider and that resource provider doesn't necessarily need to be the controller. So that's the concept of SuperCloud.
Greg: 4:50
So, really, and we created Akash as a mechanism to bring the control plane in a decentralized mechanism where no one controls the control plane. It is a community that essentially operates on consensus and a resource plane where anyone with compute can plug in and offer that to a market at incredible prices. Today, akash is perhaps the fastest growing cloud and the most cost-efficient cloud. Apples to apples. For compute resources, you have 10 times the cost savings compared to your Amazon's world. For compute resources, you have 10 times the cost savings compared to your Amazon's world. Whereas for accelerator resources, be it GPUs, you have nearly two to three times cost advantage compared to your traditional cloud. So that obviously is contributing to the current incredible growth we're expressing for us to be expressing yeah.
Craig: 5:49
So, as I understand it, part of the concept is that there's a lot of unutilized or underutilized compute in the world, data centers scattered around the world that are not being used at full capacity, and those people can make their resources available to anyone in the world, just as google server farms are available to anyone in the world, and the blockchain layer, the, the control layer, consists of it. Are they smart contracts, then, that you enter into with the provider? How does the blockchain layer control the resources that are added to the network by people with excess compute?
Greg: 6:40
Of course. So to understand the lay of the land, you essentially have about 7.2 million data centers in the world right, and if you break them down, there are about 11,000 professional data centers, meaning over one megawatt capacity. The rest of them are smaller data centers, could be closet in an office or whatnot that is considered data center. So these 11 million data center, 11 000 data centers are owned by enterprises. I mean out of them, or a thousand of them, are hyperscalers, um, and their utilization rate. I mean hyperscalers. If you remove them they tend to have higher utilization because the businesses. But they tend to have higher utilization because of the businesses. But at 10,000 or so enterprise data centers their utilization rate is somewhere around 15%. So there's enormous underutilization in these 10,000 professional 1 megawatt capacity data centers and in the semi-professional grade data centers. We don't exactly know the data that very well. Some cases they tend to have high utilization but some cases they don't. But the fact remains there is heavy underutilization. Obviously, for an AI data center you have high utilization, non-ai you have lower and if you're AI, particularly, you have. You know, if you're a professional data center in AI you tend to have a lot of use when you're training and not necessarily when you're not training, and a lot of times you know if you're training, you need the latest and the greatest chip, because the cost advantage is significantly higher if you have better chips. Um, and now the question is, what happens to the chips that you made investments to? H100s are now being replaced by h200s. H100s are perfectly great chips I mean, nvidia made about two million of them. They're really good chips. But what happens to those chips? Right? So once you're done with training? So there's a lot of underutilization and inefficient way of resource distribution in the landscape. Right Now, what Akash does is it takes this compute, essentially, or providers that have this compute can come and offer. Akash takes this compute, essentially, or providers that have this compute can come and offer a cache and tenants that means people that use compute approach a cache with an ask. Essentially, they create an order saying that hey look, I am willing to pay $2 an hour for H200s and this is my configuration, this is what I need. They place their order in an order book and providers that can fulfill the order uh bid on the on the order. So, essentially, creating a a uh reverse auction, uh, marketplace, right, so? So the, the tenant here sets the price. If the provider can fulfill, great. If they cannot, the tenant obviously doesn't win the bid. Once the provider bids and the tenant accepts the provider's bid, a lease gets created. This is what we call a contract between the provider and the tenant. There's a requirement to make an escrow payment from a tenant and there's a requirement to make an escrow payment from a tenant. So tenant prepays that funds are held in an escrow and those escrow payments are distributed to the provider as provider fulfills the service. So that's how it works.
Greg: 10:17
Now, the layer where the tenant creates the order is decentralized. It's open. So all the information besides the private information, there's no privacy. Private information is collected, but private information, when it curtails to the application, is all private, it's not exposed, whereas the resource information that means the resource requested, the price, that is public. So you have this rich, public data. It's phenomenal to look at how much people are paying for what right, and that rich data is used to price the resources by providers and whatnot. So once the provider and tenant gets into an agreement, the blockchain goes away and the relation becomes peer-to-peer. So I am talking to provider directly. There is no intervention, there's nobody in the middle right.
Greg: 11:14
Blockchain comes into place to enforce a contract and enforce payments right In a decentralized manner. So enforce payments in the sense if the escrow account is out of money, the workload gets undeploied. Essentially, that enforcement happens on the control layer. If the provider doesn't fulfill the order, the lease gets canceled. The money goes back to tenant. That layer is enforced. Every time somebody needs to pay for the order, the lease gets canceled. The money goes back to tenant. That layer is enforced. Every time somebody needs to pay for resources, that layer is enforced on the blockchain. Blockchain is just a coordination mechanism that enforces control, but not an execution mechanism, because blockchain is traditionally not a good.
Craig: 12:00
That's right, but the enforcement doesn't happen on the chain either, does it?
Greg: 12:09
Well, enforcement is really pulling the resources back. I mean pulling money back. Right, if the provider falls, the provider doesn't get paid. If tenant falls, the money goes to the provider. So that's sort of like mediation happens on the blockchain. Well, ultimately it's the providers, you know, that will comply as long as they agree with the protocol.
Craig: 12:41
Yeah, but when you said, once you know a match is made, then it's peer-to-peer, that's off outside of the blockchain, right?
Greg: 12:46
So what happens between the provider and the tenant? It's up to the provider and the tenant right Now. Provider can choose to give the tenant more resources if they want to, but the question is, why would they?
Craig: 12:59
No, that's right. Just on the enforcement, I don't quite understand. If you have compute resources, I have a workload. I find you through the Akash network or through the blockchain, uh, or then, and and then you and I, uh I'm sending you my workload off uh chain and and privately peer to peer, uh and something. How does uh the the chain know whether or not something's gone wrong? I go back on the chain and report it, or how does that happen?
Greg: 13:51
Yeah, the first obvious thing is you didn't get the resources that you need. So there's a verification mechanism that chain enforces on the provider. So there's checking, there's all kinds of things that, hey, hey, did you get the resources you said you're going to need? The provider did not. Well, the contract yanks, you get your money back. Um, you know, so it's uh, obviously, once you get the resources, you don't usually have a problem, and then you have problems later, yada, yada, yada. If your, your workload gets dropped by the provider, it gets yanked. You get the money money back. So there are certain aspects of enforcement, because I use this word loosely, because no one can control you. That's the whole point here. But if you do not comply to the contractual obligations, your contract gets yanked.
Greg: 14:42
Now is there indemnity in the sense like, well, what happens if some nefarious activity happens? Yes, there's absolutely indemnity in certain category providers that are audited, right, so not all providers are. It can be anonymous by default as a provider, but people are reluctant to deploy on someone without knowing who they're deploying to. I mean, it could be anyone. So there is a degree of identity that's exposed from a provider standpoint to the tenant in case you need indemnity.
Greg: 15:18
There's also something called TEE or Trusted Execution Environment. Tee or Trusted Execution Environment where, if you need to run your entire workload in a fully private mechanism, fully encrypted mechanism, all the way from the chip level, even chip-to-chip communications to be encrypted, you can use TEE, which is Trusted Execution Environments. Now, with TEE, of course, the challenge is there's a little overhead in terms of performance because of encryption, but it gives you security guarantees. If your application requires that level of security guarantees, you can take advantage of the TEE so that nefarious activity wouldn't occur talking earlier about the data centers around the world that are independent, right, that are not part of the hyperscalers.
Craig: 16:14
How does that can you give is, do the hyperscalers account for 80% of compute and these independent data centers account for 20% compute in the world, or is it the other way around? Or yeah, can you give us a sense of scale of what not what's on the Akash network today, but what your sort of addressable market is? I mean how that compares to the Googles and AWSs and Azures of the world.
Greg: 16:50
Yeah, so we measure. So compute is a. There's no direct way to measure compute because it's very there are too many dimensions to measure. But if you take down down, like the amount of energy being used, you can get a fairly good idea. So you know, there are about 1030 megawatt data centers, okay, I mean, half of which are located in the? U, but so 30 times 1,000 is 30,000. That's it, 30,000 megawatts, right, and there are about 10,000, one megawatt plus data centers, right, so one megawatt between one megawatt and 30,000 megawatts, so about 10,000 there. So, from a one megawatt. About situation, I think about 60%, if I remember correctly, 60% are hyperscalers, 40% are non-hyperscalers and I think that number is growing. Hyperscalers are growing. Hyperscalers are limited by two main things One is energy and second is water fresh water. So you need to have good energy and clean water in order to build a hyperscale data center and that takes a very long time to acquire.
Greg: 18:25
In the United States we have our entire grid capacity is around 1.2 terawatts, right, so 1200 gigawatt capacity and we are not good at, you know, building more energy because of regulation and it's quite a lot. If you look at the interconnect request, that means the connect request that want to connect to the grid the new supply. There are about 1.9 gigawatt capacity that'll come on board in the next 14 years, but if you look at the timing they're very, very, very slow to connect to the grid. There are all kinds of problems. Why that is? I mean all the way from aging grid infrastructure to regulation to, if you want, the most efficient power, which is a nuclear reactor that can produce up to a gigawatt. I mean 800 megawatts, 1.2 gigawatt capacity. Like the old school reactors. The last one we built took about 14 years in the us and about 32 billion dollars, right, so it's very, very expensive to build. And also, with the new regulations, the, the gen 3 reactors, uh, the the cost per kilowatt hour is about 15 cents, I believe 10 to 15 cents a kilowatt hour, which is prohibitively more expensive than natural gas. So there are a lot of.
Greg: 19:45
It's very, very complex and very challenging environment to bring more data centers. So there's a big question like how can hyperscalers survive, sustain building more data centers? In fact, if you look back, nvidia tried to build a hyperscale data center and NVIDIA owns all the chips. Like. The question you may ask is like why wouldn't NVIDIA go just build a data center, they couldn't get more than 20 megawatt capacity. I know this for sure because I talked. I know the guy who tried to build it in NVIDIA. So it's very, very hard to get capacity. And the last one nasty nuclear reactor we had was the three mile along, three mile, I mile in Pennsylvania. That was whooped up by Microsoft, right? So supposedly you know, the Stargate project is supposed to bring more nuclear, but let's see where that happens.
Greg: 20:31
So so your better bet, I think, is can we actually build non-hyperscale, like-10 megawatt, sub-10 megawatt you can get away with over highly dense solar powers, wind, you can do renewables for 10 megawatts, and that tends to be a bigger trend now. So I'm extremely skeptical if we are able to move as fast as we want in the hyperscaler market. But I think we have a bigger chance to move in the second tier market, which is the one megawatt to 10 megawatt, and I think we have an enormous opportunity now to move in the 100 kilowatt to one megawatt capacity data centers too. These are small, modular data centers. These can be fit in a container, right. Microsoft is experimenting with them. All kinds of innovative things you can do in terms of cooling, in terms of like, placement and whatnot, and the advantage you get with a smaller 100 kilowatt or to one megawatt data center says tapping into distributed grid. There's solar energy everywhere, right? So we actually wrote a paper. I mean, akash, I appreciate it with you. I mean, we did this analysis on like.
Greg: 21:51
Okay, if AI is going to be so important in our lives, if I want everything in my house to be an agent driven, so this conversation right here should be recorded by an agent, should be processed insights. I have a year old at home. I want to know everything she's doing at home while I'm working. Hopefully have an agent that will warn me if she's getting into things she shouldn't be getting into. I want the whole house to be automated, every conversation to be recorded. But I would hate that conversation to be stored in a cloud because I do not trust anything that leaves my home network as no one should. Okay, now can I have a sovereign AI in the home? I think most people would want an AI in the home, as long as it guarantees privacy, right, and I'm building that. But okay, what do I need to get the latest and the greatest AI in the home?
Greg: 22:45
Well, deepseek 365B can run very well on H200 clusters. It needs about eight H200 clusters. That costs about half a million dollars that cluster. Now we did some study. We said feasibility, studyibility study, like, okay, is there any way you can have sovereign ai in a semap professional data center that takes about 30 kilowatts of energy, uh, in the home, uh with that is cost efficient? And the answer was yes, absolutely.
Greg: 23:19
If we can acquire about five of these H208 by 8 chip clusters, this is called HGX clusters essentially. Or 40 chips earning about you know first state, about $2.3, you know, dollars per hour, 80% utilization with 20% overhead for cooling. Your CapEx slash, opex plus OpEx can be recovered within five years by placing them on a cache. What you're doing is you're literally allocating one HGX cluster HGX as HFs to your home use, so that's completely dedicated, 100% utilized to your home use. Hgx has eight chips to your home use, so that's completely dedicated, 100% utilized to your home use. The rest four of the HGX clusters you're offering on the market for 80% utilization, since you already have a cooling and energy infrastructure.
Greg: 24:10
Now we can further reduce the cost by having solar panels. Now the big challenge of solar is obviously storage right, so solar only comes for eight hours a day, or max six hours if we're lucky, actually. But storing, like, say, you have a hundred kilowatt capacity or 30 kilowatt capacity, storing that much capacity for over 10 hours requires a 300 kilowatt battery which is prohibitively expensive. It'll cost you about $200,000,. But if we can sell it back to the grid, like, okay, overproduce the energy, use that energy to sell it back to the grid. In Austin you get about 4 cents a kilowatt hour, 3 to 4 cents depending on your region. That way you can pay for the compute by tapping back into the grid. So grid will sell you back at 10 cents. I know it's a scam, but grid will sell you 10 cents. You sell at four cents Still, but still prohibitively, you know, not prohibitively. Very, very feasible. So we did all this study. That's why tapping in to a home network, a semi-professional home network and this will take 142 URAC which is put in closet when AI becomes very important.
Greg: 25:27
If AI has to really enter the home which I think it will I think that's when we're going to see an explosion of these data centers. It could be an office, it could be anyone that want to get a rig, it could be someone in university, for example, they want to get a rig. As long as you have solar, you can completely actually I mean very efficiently run. So there are about 7.2 million of these data centers. We had about 8.6 million peaked in 2017. Obviously, cloud came and killed a lot of these data centers, but I think there's going to be a resurgence of the data centers, for privacy, for ownership, for cost reasons, and that's one of our goals too is to have a decentralized AI, and we're going to achieve that by decentralizing the energy production and energy consumption as well.
Craig: 26:13
Right, but for the time being they're not in the home. Primarily, they're small data centers scattered around the world as they sign up. If I have a workload, do you have an orchestration layer that routes my workload to the right data center, or do I have to do that on my end?
Greg: 26:42
Yeah, so it wouldn't do the data center selection, for I have to do that on my end. It wouldn't do the data center selection for you by design because the blockchain is not application aware for privacy reasons. Remember, a blockchain is a public chain. In order for it to route, it has to be very application specific, because every application has a different way of scaling, but so it has a bidding engine, doesn't have a matching engine by default, and application specific information is one reason.
Greg: 27:17
Back a bit with better resources that you may need, better price performance that you may need, because a lot of times people don't actually know what they need. You know, that's why you go to a store and be like oh, I thought I need this, but I need something else. So a lot of times you know people can, sometimes you can, compromise too. I mean, you don't exactly get what you want, but there is something that's equally good. You know you should be able to purchase that right Like so. That's why it's very, very hard to develop a matching engine for humans. And so Akash doesn't have automatic routing. Now you can. You can lease out multiple data centers on Akash and have your own routing mechanism, bring your own router, we call it. But that again is very application-specific, because you understand the application way better than the infrastructure provider. Akash gives you the sovereignty and control for you to be able to have flexibility to route your application everywhere.
Craig: 28:14
So yeah, so you put a bid on the chain, is that right? The bid goes on the chain. It's seen by all the providers attached on the chain. Is that right? The bid goes on the chain. It's seen by all the providers attached to the network. They bid on it. I'm sorry, you put a RFP or whatever on the chain. They bid on it. And then you select whichever one you want, or you can split your workload between multiple providers. I would guess what happens if I mean. Well, two questions. First, in the beginning you're going to get a pretty reasonable number of bids, but as this network grows, the day could come when you get more bids and you can reasonably go through. Is there going to be an application outside of the chain that can sort through these? I mean, is that something that you guys are going to offer so that people can find the optimal partner or compute?
Greg: 29:29
provider. Yes, bid selectors, we call them. It is on the roadmap so you can select a bid selection strategy that's optimal for you, and these are pre-created strategies optionally right. So it can be like hey, cost, that's optimal for you, and these are pre-created strategies optionally right. So it can be like hey, cost first, optimal for cost, latency first, optimal for latency. Or combination of cost and latency, cost first and latency later. You can have all kinds of rules, a rules engine based sort of like. In fact, we're looking a little beyond rules engine. We're looking at an agent-based mechanism where agent can adapt to different situations, considering agent knows your application, you can read your code, understand your code, how it works and make the appropriate choice for you, instead of a rules-based system where that could be fragile. Yeah, so, yeah, we definitely have. In fact, we have applications built on top of Akash.
Greg: 30:24
Different providers, like Crime, Intellect, NVIDIA, actually uses Akash. They have their own, you know, product called Brev, there's a product called Venice. They're all different products that actually have their own selection engines because they auto scale. The idea of Akash was it gives you primitives, it gives you it's like Lego blocks, right. It gives you all these cool things and, as a programmer, you can decide what to do with those Lego blocks, right? So more often than not, people don't.
Greg: 30:50
There are a lot of people who use Akash directly, but most of our usage comes through the distributors, meaning NVIDIA's, Prime, Intellect's and Venice of the World, where they offer and Bitmine a whole lot of things out there. They offer their own flavor of Akash on what they think you know is the best winning strategy for the bids, right? Absolutely. I mean, there should be a bid selection engine. Will Akash provide it? Probably will give you options that will make your job easier, Assuming that you're a power user and you understand how these bids work. But we want to be as explicit as possible in terms of granularity, that you have full control over who you get, and I think that is a very important capability.
Craig: 31:37
And you mentioned price and latency. When the bids come in, do they come with some metric that measures latency for where you are, where your workload?
Greg: 31:52
is the workload will be placed, everything you would need to make the selection, and so you can write a latency checker. Essentially, that'll check the latency depending on where your users are. So typically what you would need is you would analyze your user traffic and you would select a mechanism where it'll select the data center. That's, in 95th percentile of your user activity can be achieved within 50 milliseconds of network latency. Right? So that's really what you want to want to do, but those controls need to be written and that need to be specified based on your application, some applications.
Greg: 32:35
This is a very complex thing because, like, if you have hyperscale applications like Facebook, you know you most definitely have different shards on different regions, right? So California users come to California, new York users come to New York and, more importantly, the data should be available in these local regions, right? And what happens when you have a New York user trying to access California or when they travel, when a New York user goes to California? So this little latency-based selection is not as easy one would think it's easy when the application is simple, but as it gets more complex it gets very, very, very challenging, right, like? So, yeah, so you can't just have a latency engine because you know you may not actually give the right information based on the application.
Craig: 33:23
Yeah, and then the other thing you mentioned that you may not want to put certain information with a hyperscaler because you're concerned about privacy. About privacy, how do you know how secure these compute resources are on the Akash chain? I mean, you know you may be accessing, you know a small data center in Hong Kong. You know you're not there. You don't have people looking at their security protocols. How do you ensure that you're?
Greg: 34:10
secure. So there are two main mechanisms Akash provides. One is a decentralized auditory mechanism where, if the provider says, hey, they're compliant, they're HIPAA compliant, they are, you know, whatever standard compliance they have, they have their tier four data center. They are, you know, the tier four in terms of redundancy and whatnot these auditors go and verify their off-chain. They literally run applications, they run audit checks, they physical proof, the paper proof, they actually go verify and they post their results on-chain saying that, hey, I have verified that this provider says who they are, their identity. More importantly, because if something goes wrong you want to go after them, kind of thing, right, all that information is posted on-chain. I mean, in fact, not the verified information by multiple auditors, right? So as a tenant, you can choose the auditors you want to select, and these auditors are public figures Like Overclock Labs Mr Akash is one of the auditors, for example right?
Greg: 35:08
Second mechanism is we provide something called TEE or Trusted Execution Environment. What TEE does is it gives you a non-custodial way to encrypt runtime, not just transport. So your transport is obviously encrypted, right, that means communication between you and the provider is encrypted. But what's not encrypted is what's in the memory, because it's hard to encrypt. So anyone with physical access to the machine can theoretically if they're talented enough, can theoretically look inside what's in the machine, what's running. To avoid that, you want to encrypt the memory itself, and that's achieved through trusted execution environment. Now the trade-off obviously there is you need more resources, because it's encryption, right. It needs more compute, it needs more GPUs. So if you're doing AI, nvidia has this amazing TEE mechanism in their newer chips, h200s and H400s, where they encrypt memory within the chip itself and they also encrypt the transport between the chips using NVLink. Nvlink is 3.2 Gbps. It's very, very fast, so it may slow down a little bit. We see, right now we're seeing about 10% overhead, which is not a bad trade-off.
Greg: 36:25
If you want privacy, right, especially if you're an unknown data center. But if you're deploying on something called Equinix, which is in the US, which is a professional data center, you don't have that problem. So you're more relaxed when it comes to like, hey, okay, we can trust this data center or whatnot, but ultimately, yeah, you have to trust somehow, right. So, either through encryption, trusting, or through auditability, trusting. But this way you actually have a lot more, you know tools at your disposal that you normally don't get when you go to a data center directly and play with them, right. So you have all these like infrastructure tools at your disposal that you normally don't get when you go to a data center directly and you have all these infrastructure that Kosh gives you that you don't normally get going directly.
Greg: 37:08
So going to hyperscale is just trust the brand. You say, okay, amazon will not do something. You're trusting Amazon will not do something. I mean, there's a good amount of truth to it, but we know how, how this trust could be corrupted. We just don't know what's happening. If you're a big company trying to use Amazon, they'll let you open the boxes. They'll literally show you, like, if you're Department of Defense, they literally let you audit everything, right. But if you're at George Small, no one cares. They just you know you wouldn't even know if someone has access to your data.
Greg: 37:44
It's very famous to chat GPT because one of the engineers under interview they said, yeah, we look at users' products. So I felt violated because like, okay, my prompts are good, lord, right, you know all kinds of things and I'll be judged by the person that is looking at my prompts and that feels you know violation of privacy, right. And if you have anything on the cloud, there's no guarantee that these people are going to look at your thing. But if they adhere to certain standards and they don't comply and you have some degree of like you know indemnification, right. So that's why I feel like if you want ultimate privacy, you have to run the home. No, there's really no way. Ultimate privacy, you have to run the home. There's really no way. Second, best thing is TEE if you can't run the home. Well, a little overhead. And third best thing is probably like ultimate trust in the provider. But Akash gives you that TEE, which is much better than a cloud provider.
Craig: 38:38
Yeah, and then the other question is if I'm running a workload and my data center goes down, or maybe there's an outage and the backup doesn't happen, the generators don't kick in, or whatever what happens to my workload?
Greg: 39:05
It dies, and so you have to know how to be redundant pre-application and that's one of the big areas. You had the same problem with the cloud today, too, and cloud work. Amazon has about 200 outages last year, right, and we all saw what happened with CrowdStrike Somebody could push bad code and the entire US airline infrastructure came to a standstill because people couldn't fly. So centralized systems generally have a lot of force that they don't expose. So one of the key design patterns to use Akash effectively is to be redundant from the get-go.
Greg: 39:42
Akash makes it cheaper and makes it more optimal to be redundant, but there's really no silver bullet and it just happens all the time, right. So we see ISP like. That's why you want to see when someone says there are tier three, tier four data centers, that means they have a dual ISP, ideally with a Starlink backup. I'm happy to share you the AP we wrote recently like okay, how do you design a home data center with dual ISPs with a Starlink backup, with dual generators? You need a gas main diesel generator with automatic switch off as well as a gas generator and, ideally, solar panels.
Greg: 40:19
So we have all kinds of USPs, uninterrupted power supply UPSs and whatnot. So all these aspects are verified so it reduces the chance of false. But false happened. You cannot prevent false, but what you can do is come back up faster. So no matter, like Mike Tyson says, no matter how planned you are, your plans are only as good as you get punched in the face. So you will get punched in the face. So you will get punched in the face when you're running infrastructure and that's the reality of infrastructure and you talk to anybody that tells you infrastructure runs, infrastructure will tell you they will be false. No matter how hard you think you build your data center is. But it ultimately comes down to redundancy. You can run in a quorum, for example, If you have a database, well, don't run a single database.
Greg: 41:03
Run a master-slave Sorry, you're not supposed to use the words A leader-follower-style database where if your follower goes down, the leader will self-elect a different leader or spin up a new follower database that will always be alive. So you need to be a little bit more professional. To use Akash Now natively. There are applications on top of Akash that makes all these things easy. You know. You don't have to think about redundancy. Like you, just click a button, it's taken care of for you, right? But that's very application specific. Akash doesn't provide by design those redundancy mechanisms because, again, it depends on the application, right, how you like.
Greg: 41:46
A lot of times when you're scaling databases, you need to think about consistency. What level of consistency you want? If you want immediate consistency, you know so. When you have multiple nodes in a database, right, and you want to make sure all the nodes are presented, giving the same data. Well, if you have multiple nodes, if you want to be redundant, you want to make sure all the nodes are presented, giving the same data. Well, if you have multiple nodes, if you want to be redundant, you have to have them in different locations around the world. Well, you're not going to present the same information because there's latency. There's an east to west, 300 millisecond latency, right?
Greg: 42:18
So if you want an immediate, consistent database, you got to make sure that all the data sets are replicated is about 300 milliseconds. So when you write you can't read immediately. You got to wait 300 milliseconds. That's called immediate consistency. Well, if you don't care about immediate consistency I mean eventual consistency is fine with you. That means you write a data entry in California that entry takes about. You know, you're assuming the reader is reading back from California and not reading from New York, and you're fine with that. So if you're building social media, you're fine with that, but if you're building something like a banking application where someone withdraws cash from an ATM in California should not be able to withdraw within 300 milliseconds in New York. You needaning consistency. That's why scaling and redundancy planning is very application-specific, and so that's why Akash doesn't make the choice for you and you have to design your redundancy on how you want to see it.
Craig: 43:24
So this is new, right. I mean, how long has the network been operating? About four years now Four years. Four years, yeah, and as you said, the demand for compute is presumably going to rise sharply as AI spreads through the economy. Do you have a target for how much compute you'll have on the network by X date or any sort of roadmap that way?
Greg: 44:05
Yeah, so we're growing at 10X now every year, which has been phenomenal. We hope to continue to grow 10x, you know, you know. So right now we have, you know, we want to get to what? 10,000 GPUs by the end of the year, which we think, and now about 100,000 GPUs by the end of next year, or a million GPUs within the end of the year, which we think we can. Now, about 100,000 GPUs by the end of next year, about a million GPUs within the next three years. And we're doing fairly well. We're doing really well. We're growing really, really well. We're growing at 20% thing month over month now.
Greg: 44:39
And, more importantly, it's not how much compute you want to get, whereas it's about how much compute we have on the network gets used. So utilization rate is extremely important. So we don't. We scale responsibly, because every time there's a new compute and a new provider that comes to Akash, they should be able to sell out the inventory. If they're not selling inventory, there's no point in coming to Akash. So very, very important for us to make sure there is a good equilibrium between the provider's capability to sell compute and the tenant's capability to scale their workloads right. Again, if provider sells everything, I mean there's no inventory, tenants will be upset. So that's why you got to be very, very careful in how you scale the systems. Our utilization rate right now is 70%, very healthy, very, very good. Now that utilization rate can go up as we scale because there'll be more resources generally in the pool. But so our constraint is utilization rate and we don't scale without that. We're doing very, very well now.
Greg: 45:49
I think our next big goal is transitioning from a resource market to a services market, a resource economy to a service economy. Right now, think of Akash as just a marketplace for resources, like coming to a commodities economy. So right now, think of Akash as just a marketplace for resources, like coming to a commodities exchange where you can trade gold and whatnot but it is not a marketplace for services. So a lot of times when you're building AI systems or any systems as a matter of fact, you use several resources like databases, vector databases, to inference services, to agent hosting platforms, to a whole lot of services that one would orchestrate together to build. And a lot of times, when you look at the cloud providers, the resources are open source systems. Look at Redis, for example. Elastic Cache on Amazon is Redis. Or the SQL database services Amazon offers as MySQL or Postgres. These are open source protocols and Amazon really takes them and white labels them and sells them. We feel like if we empower open source developers to be able to provide their services on Akash, we can effectively have a sustainable open source ecosystem.
Greg: 47:06
Right now, a big challenge for open source software is sustainability. Docker, for example Docker is one of the most widely used open source container infrastructure system, but they couldn't figure out a business model. After raising it a billion dollar valuation, they sold the peanuts to Nutanix. They're no longer an open source company. Kubernetes 80% of the globe uses Kubernetes. They can't keep their contributors together anymore. Right, look at every. I mean, besides Linux, which has a very weird founder and is a very special founder, besides Linux, we don't have any example in the wild that remain purely open source, right, successfully, and so we have to change that. We have to create an economy for open source contributors to be able to sustain themselves, to build open source software, and I think that is what Akash is going to transition to with the services economy, and that's coming next year, I think. Think that is what Akash is going to transition to with the services economy, and that's coming next year, I think.
Greg: 48:08
With that, and also like our, we also measure revenue per GPU right. So revenue per GPU is right now at $20 per GPU per day, which is fairly good. We were $10 a year ago and you know we did a good job in increasing the revenue per GPU. But with additional services we can further improve our revenue for GPU to about $50 to $100. That comes with premium services on top of it and that revenue goes to open source developers. So we not only look for just capacity per se, pure capacity, but we also look at how do we increase our value per resource that we provide, as well as how do we scale while maintaining good utilization. That's very, very important for the health of the network. The more healthier the network is, the higher chances of success we have in the future.
Craig: 49:00
So yeah, I mean, once you have the services up, presumably you'll have people that can onboard a tenant, as you call it, or a user and walk them through the system and get them connected and do all of that stuff. But for now, how does somebody use the network?
Greg: 49:29
So you could go to Akashic Network and there's a button that says deploy. There's an application called Console Console and Akashic Network is a great way to get started. You know there's $10 like free trial if you want to go try it. It's actually very good to use highly. Encourage folks to try it. No signups. It's very, very straightforward. People love it.
Greg: 49:52
If you want a more easier system in terms of in terms of like getting, like SSH style, like access, you can go to Prime and collect. They use your cash underneath you apply a resource, you'll see the power. So there are several ways, but I think the best way is to go directly, because you get the most cost advantage. Right now I don't know if you have any H100s, h200s left on the network because they go like hotcakes, but H200s are $1.99 an hour, which is the lowest compared to Amazon I believe Amazon's $4 or $6 per hour but like three times cheaper. And the reason why it's cheaper is because I'll send you the link where we did a cost analysis. Right, you can actually amortize your investment with a decent utilization rate fairly easily. So there's no reason why you should be paying Amazon margins. I mean, amazon makes record-level margins from you, right. So you remove all the margins, the resources are actually fairly cheap. That's really what Akash provides. So that's why resources are very, very cheap on Akash. But they go up, like H200s are gone.
Greg: 51:09
Our utilization rate was like 98% last we saw with H200s. That's a problem with, like hot chips. You don't really have availability and we're trying to improve that as much as possible. But H100s are available for $1.20, which is also very, very competitive compared to Amazon. So I highly encourage folks to check it out. If you want to see the pricing, you can go to akashnetworkscom and you'll see the GPU pricing and you'll find them extremely competitive. You also get to know how much availability there is on the network so you can plan.
Greg: 51:44
I highly warn you. You it's very addictive, uh, especially when you have, especially when you, when you automate. So a lot of our users just automate the hell out of it, because the moment they see a gpu I guess we so tripped away, especially h200s. Uh, deep secret, deep secret runs really well in h200s. So it's very addictive because people are like constantly bidding, constantly bidding to get into this really cool game. I think there's also somebody that's building features where they actually purchase the compute and they resell the compute at a higher rate, so it's a very fascinating market actually.
Craig: 52:18
Lastly, you're not the only decentralized cloud computing platform out there. Decentralized cloud computing platform out there how do you regard yourself among the competition?
Greg: 52:33
So we're definitely the leaders. We're the first one and we're the leaders. So there are a lot of copycats. I mean, success gets copycats and that's the reality. Right, we are not the only one, but we are the first and the biggest open source cloud, so new copycats are actually closed source. They take our code and fork our code. We've seen that several times right Now, not to disregard the competition, but I think it's a good thing that people are looking at our model and seeing the success behind our model and trying to have their own flavor and they obviously want to stay closed source because, well, you know, it's advantageous to be closed source because you can ship code faster.
Greg: 53:16
So Akash is very decentralized. I think from a pricing standpoint, we're also much better From a pricing. We have much better SDKs. We have much better user experience. We have much better user experience. We are much better in several aspects and also, you have a lot of.
Greg: 53:33
We have about 30,000 developers in our discords. Anytime you have a problem, you just go to the discord and this helps solve your problems really quickly. So an enormous community and we have enormous participation. Right, we have about 500 contributors that come and build Akash. So if you see a feature that you want to build, you can write a proposal yourself. So if you're an open source developer, if you like building open source, Akash is the platform for you, because it not only lets you use an open source system but also lets you build and pace you Like.
Greg: 54:05
If you have an idea that you think you can run on Akash, you have, I believe, $25 million in the community pool that you can apply for a grant and get and use Akash. So a lot of university students are using Akash that way too. So it's not a company you're dealing with. It's a community you're gonna be part of. So that's a big difference between using Akash and a company with products. It's not a corporation, it's a community. So if you are that type of person that believes that communities can actually solve societal problems, Akash is for you. But if you are one of those people that want to pay a company that can leverage decentralized stack as well, there are options, Of course, companies out there that are offering something similar to Akash out there that you can take advantage of.
Craig: 55:01
Is there anything that I didn't cover that you'd want listeners to know? No, I think just.
Greg: 55:09
I think I want to emphasize why this model is going to be extremely important and I think why people doing AI, especially training, should be considering some of these models Like our biggest challenge for AI. There were two big challenges for AI training right. The one was data. Right, we didn't have data. There's a data limit as to how much you can get data to train. And second was energy. We solved the data problem. What we saw with DeepSeq it can use synthetic data. So it's very, very amazing. Using synthetic data and using a mixture of experts mechanism, you can actually solve the data problem. But what we cannot solve is the energy problem.
Greg: 55:53
I think that's why it's very, very important, if you're doing training, to focus on distributing your training runs versus trying to go with the traditional mechanism of centralizing your training runs, and because we're going to hit a cap in two years and we have no solutions right. That's why we have $500 billion supposed investments and a lot of the investment is going towards power infrastructure, but by the time we realize the benefits of the investment is going to be late. So I highly encourage folks to take decentralized, distributed training more seriously and look into mechanisms employed by News Research. News is a top team. They came up with something called Distro. Recently, google DeepMind came up with a paper called Dialogo.
Greg: 56:39
There's several companies that are having their own approaches and some of them reduce. The One approach is like reducing the amount of them. Reduce. The one approach is like reducing the amount of communication between the nodes to train better. Some of them have better verification mechanisms employed locally to train better, and I'd love to see more work, more different approaches and more experimentation in the space and that's really, I think, going to benefit all of us and really take the power away from the benefit all of us, you know, and really take the power away from the opening eyes of the world. If you want to really disrupt them, you have to think of a decent position.
Craig: 57:15
I'm past the point of looking for jobs, but I'm not past the point of looking for people to hire, and when I need to do that, I turn to Indeed. Imagine you just realized your business needed to hire someone yesterday. How can you find amazing candidates fast? Easy, Just use Indeed. When it comes to hiring, Indeed is all you need. You can stop struggling to get your job posts seen on other job sites because Indeed's sponsored jobs help you stand out and hire fast. With sponsored jobs, your post jumps to the top of the page for your relevant candidates, so you can reach the people you want faster. According to Indeed data, sponsored jobs posted directly on Indeed have 45% more applications than non-sponsored jobs. Plus, with Indeed-sponsored jobs, there's no monthly subscriptions, no long-term contracts and you pay only for results.
Craig: 58:23
How fast is Indeed? In the minute I've been talking to you, 23 hires were made on Indeed, according to Indeed data worldwide. There's no need to wait any longer. Speed up your hiring right now with Indeed, and listeners of this show will get a $75 sponsored job credit to get your jobs more visibility at Indeed. To get your jobs more visibility, go to indeedcom slash IonAI, IonAI. As always, all run together E-Y-E-O-N-A-I. That's indeedcom slash IonAI right now and support our show by saying you heard about Indeed on this podcast, Indeedcom. Slash IonAI for a $75 sponsored job credit.
Sonix is the world’s most advanced automated transcription, translation, and subtitling platform. Fast, accurate, and affordable.
Automatically convert your mp3 files to text (txt file), Microsoft Word (docx file), and SubRip Subtitle (srt file) in minutes.
Sonix has many features that you'd love including automated subtitles, enterprise-grade admin tools, transcribe multiple languages, share transcripts, and easily transcribe your Zoom meetings. Try Sonix for free today.