• [Featured articles] Scale entanglement

    By Jeongkyu Shin

    This article originally appeared in Crossroads, May 2023. [^Editor's note]

    This article is translated from Korean.

    The original order of the essay is 2023 > 2015 > 2020 > 2017 > 2018 > 2019 > 2021 > 2022 > 2023.
    While the emotional flow follows that order, the essay has been re-edited in chronological order for the reader's understanding.

    After 2023, March 14th may no longer be known as Pi Day, but rather as Chatbot Day.

    It was the day when all the language models that had been in storage were simultaneously unleashed into the world. Starting with Google's release of PaLM fine-tuning + generative models on Vertex AI, followed by OpenAI's announcement of GPT-4, Microsoft officially confirming that Bing was already using GPT-4, and Anthropic's formal release of Claude bot, all within a span of 12 hours.

    That morning, after reviewing the GPT-4 tech report released by OpenAI, I left a post on Facebook about the technical aspects that impressed me.[1] A comment was left: "I feel both joy and pain seeing things I wondered if I would ever see in my lifetime becoming reality..." I replied, "Now, no one is interested in the Turing test anymore. Within a year, it has become a question of 'Isn't it obvious that it will pass without a wow point?'"

    When actually facing the keyboard, the task of organizing knowledge seems to have already left human hands. I tap into human stories to find the meaning of recording.

    To avoid sounding like an alien language, let me briefly touch upon the aspects of artificial neural networks necessary to understand this essay.

    A program that simulates the connections between neurons is called an artificial neural network. Neurons are grouped into layers, and the network is designed by overlapping these layers and creating connections between neurons in adjacent layers. Deep learning is a term used when there are many layers in an artificial neural network. The results of various artificial neural networks are called deep learning models, or simply AI models to sound more impressive.

    The connection strengths between neurons are called parameters[2]. The number of connection strengths is called the number of parameters. As the number of parameters increases, more memory is required, so it is said that the model becomes larger. Training a model involves connecting artificial neurons and adjusting the connection strengths between them so that the output for the input data takes the desired form. After training, an artificial neural network can mimic an extremely high-dimensional, discontinuous state space.

    Those are the basic terms. So, when should we start recalling the story?


    I founded Lablup Inc. The name was chosen as a pun, meaning both "lab eul up" and "lab | up". (eul is an adjective to modify a noun in Korean language. | is linux pipe command)

    People who had been suffering through their Ph.D. programs came together with the goal of creating a research automation platform in the field of computational science to alleviate the hardships of others. We believed that instead of clumsily running clusters by placing workload managers[4] on bare metal[3], we needed a computational environment that guarantees reproducibility and portability. The start was courageous, but there was no demand for a research platform. Within two months of founding the company, we learned the hard way that neither universities nor research institutions, despite easily purchasing equipment, are stingy when it comes to spending money on software. Universities lacked funds, but had an abundance of graduate students who would figure things out on their own when instructed. The industry did not yet have a large-scale demand for scientific computing. At the same time, we painfully learned that Ph.D. holders like us, who had only been in school, needed to go through a process of re-socialization to communicate like normal people.

    It wasn't just our lack of re-socialization that was the problem. It was the content of what we were saying. Stories about science advancing based on technology or stories about accelerating innovation based on computation were considered science fiction topics wherever we went. We were getting exhausted. Yet, as they say, people only see what they can see. The potential of deep learning models was clearly emerging. In fact, various movements had started to prevent deep learning technologies and their outcomes from being subordinated to capital. One of the representative organizations was OpenAI[5]. Such changes seemed to be evidence that we were heading in the right direction. It felt like the direction would become a bit clearer after just one more year.

    On the verge of our first anniversary, deep learning garnered significant social interest. Around the end of 2015, TensorFlow[6] was released to the world. As the first lecture material for the coding platform prototype, we translated the entire TensorFlow manual and uploaded it. When AlphaGo had its match in March 2016[7], our service crashed for the first time due to the influx of people. If it weren't for that match at that time, perhaps there wouldn't be a sequel to this story. Thankfully, because of that, the company survived.

    We needed a demo of a research platform performing massive-scale computations. Language models were a topic that required enormous computational resources and could be immediately tackled as a hobby since my Ph.D. program. In 2016, we released a chatbot created by placing a language model on the platform we were developing. It attracted a lot of attention.[8][9] The chatbot quickly became an in-house project. However, after just over a year, the chatbot project was shelved.


    In the fall of that year, Lablup decided to halt the development of language models, which had been a side project, and focus solely on Backend.AI, an AI cluster management platform.

    The shock of the Google Assistant demo I saw in Krakow, Poland, where I was invited by Google, was significant. The topic of that meeting, which was attended by just over ten people, clearly showed that language models had now become part of the arms race for resources, and without large-scale investment, it would be impossible to keep up with the changes thereafter. Over dinner with Donghyun Kwak, who attended the Google Developer Summit with me, and Sanghun Lee, who visited Krakow to attend a conference held at the same time and place, we discussed, "It seems like the future I saw in the field of physics will start here as well."

    The Manhattan Project strongly appealed to all of humanity through nuclear weapons that technology becomes power nearly eighty years ago. Physics was no longer an object of romance, but an object of investment. The era of physicists making a living, which began that way, led to changes in the fields of massive science, connecting to the space program and particle physics. On the way back to the accommodation that night, I sent a message to the team members, "Let's not develop language models anymore. From now on, we'll lack the funds to keep up."

    History always repeats itself. Then it's not difficult to predict what changes will occur in the future. It's just a matter of timing. We shared the opinion that 'the inflection point will probably be in 2020.'[10] By then, wouldn't it be possible to achieve profitability? It became the company's goal. Language models were advancing beyond LSTM-based machine translation. The AlphaGo shock became the source of countless jokes people made about AI. A tremendous number of "AI companies" were born. However, most of them became coin companies or metaverse companies two years later.


    The transformer[11] architecture began to be applied to various parts of all kinds of language models. In terms of letting us know 'what' to focus on, transformers solved many aspects of model context memory, maintenance, and emphasis. They could be used for encoding to put information into the state space or for decoding to extract information from the state space. Google introduced BERT[12], while OpenAI introduced GPT. Both were transformer-based language models, but they differed in their focus on the encoder and decoder, respectively. BERT focused on the encoder part, while GPT implemented an architecture that creates memory for causal relationships by linking the output to the input through the decoder, which is structurally different from BERT. BERT and GPT, and the later T5, no longer used labeled corpora. Using transformers, language models could be created by first training the model to learn the structure of language from the corpus itself and then fine-tuning it. It was still far from the end-to-end training philosophy of AI model development, which involves no human intervention in the data. However, the concept of data acquisition began to change from that point on. Quantity is more important than labeling. It was the beginning of general-purpose language models.

    BERT seemed to be able to replace most of the existing language models at an incredibly fast pace. Its overwhelming performance raised expectations for making significant improvements in various language tasks such as document creation, chatbots, and document analysis. However, as of 2018, BERT was a model so large that it was unimaginable to train. It seemed like no one outside of Google would be able to create a model of that size. But that moment was fleeting. Facebook quickly released RoBERTa[13], which was based on the BERT paper but increased in size. It was a symbolic action that simultaneously announced that TPUs were not necessarily required and that anyone with capital could participate in this race.

    The first bottleneck in increasing model size using GPUs appeared in GPU memory. Models could no longer fit on a single GPU device or be trained in time using a single device. Depending on the case, it became common to split models across multiple GPUs or distribute training across multiple computing nodes. Horovod and Distributed TensorFlow began to shine.

    Technology continues to advance, and the cost per unit of computational resources continues to decrease. If this progress continues, the popularization of AI will eventually take place, and at that point, price competitiveness, which is the same in all markets, will become the most important factor. "The era of price competitiveness will come for AI as well," I wrote, praying not to go bankrupt until then.

    • BERT, 340 million parameters
    • GPT, 110 million parameters


    For several years while creating a distributed processing and distributed training platform, I occasionally wondered, 'Could we be creating a platform with no demand?' After 2019, I no longer had such thoughts. From 2020, there was no time for such thoughts.

    As soon as the new year began, OpenAI announced GPT-2[14]. The GPT model, which focused on the decoding process of extracting information from the topological space, demonstrated remarkably stable text generation capabilities. GPT-2 became the basic code that anyone could use to create a language model. Along with PyTorch, Horovod, and Distributed TensorFlow, the difficulty of code access was rapidly decreasing. Google's XLNet and T5 (Text-To-Text Transfer Transformer) language models in 2019 seemed to have crossed the river of model sizes that were thought to be impossible for humans to surpass (by spending capital). Google strongly appealed that training T5 required enormous computational resources at the TPU level, emphasizing that it would require hundreds of NVIDIA's V100 units, which could be purchased with effort in the market. (A single V100 unit cost around 12K USD at the time.) Like BERT, T5 only released the paper and did not release the model. Google had the painful experience in 2017 when they released the BERT paper but did not immediately release the model alongside it (because training was not yet complete), and in that gap, Facebook preemptively released RoBERTa, which they had trained by increasing the size of the same model. Considering that they still did not release it, Google must have been confident that it would be difficult for anyone outside Google to reproduce the training of that model.

    At the end of 2019, we ended our long wandering and moved into our own office. The era of massive deep learning models was coming, and to prepare for it, we needed to collaborate with more people. The size of language models was increasing tenfold every year. As I carried the not-so-many belongings in boxes, I questioned myself. At this rate, in three years, the model size would increase a thousand-fold, but were we ready to handle a thousand times the workload?

    • RoBERTa, 350 million parameters
    • Transfer ELMo, 460 million parameters
    • GPT-2, 1.5 billion parameters
    • T5, 11 billion parameters


    Two months had passed since we moved to the new office. The winter was long.

    By the end of February, we were unable to complete the interior of the new office we had moved into. The wall finishing materials for the office, which were supposed to arrive from China at the end of February, never made it. The company's entire roadmap changed. All business trips to the United States were canceled. The office with unfinished interiors remained unfinished and guarded the empty space for the next two years.

    COVID-19 not only tore apart the company's future but also separated people. My first child started elementary school by lying diagonally against the floor, watching the tiger (meaning of strict in Korean) teacher on the EBS educational broadcast TV channel. I rolled around next to the child rolling on the floor. How busy had I been? Even while raising my children, it wasn't until I was trapped in the same house due to the coronavirus that I felt the reality of being a father. I wondered how long that sad yet strangely stabilizing time would last.

    That year, OpenAI released GPT-3. The theoretical foundation was not significantly different from GPT-2. However, one thing was vastly different: the size. It was a massive model with 175 billion parameters. Not only for model training but simply loading it onto a GPU was expected to occupy an entire NVIDIA DGX-2 supercomputing node. Unlike GPT-2, this time, neither the code for the language model nor the trained language model was released. Wow. Non-disclosure in the field of deep learning. Something was changing.

    There was a movement against the supremacy of model size. Does performance increase accordingly as the size of deep learning models grows? A debate began between researchers at Google and Meta. One side argued yes, while the other argued no, engaging in a word battle in the form of papers. However, this debate, which lasted from 2019 to 2021, did not last long. As the size of language models increased, interesting phenomena were discovered. Deep learning models had scaling laws[15]. Something happened around 100 billion parameters. Regardless of the model structure, at some point beyond 100 billion parameters, language models began to do unexpected things beyond simply generating coherent speech. The phenomenon called in-context learning allowed models to learn various knowledge on the spot without model training and derive logical conclusions. It was the beginning of the race surrounding large language models (LLMs).

    While language models were simultaneously immersed in the debate and discoveries surrounding the size problem, the introduction of deep learning in medical applications began at a tremendous pace. DeepMind's AlphaFold2 achieved high-accuracy structure prediction without Monte Carlo simulations, using only predictions. It reduced the required computation, which was a major challenge in the field of proteomics, to nearly one-thousandth of the previous level. From microscopic stages such as predicting coronavirus mutations, filtering vaccine candidates among synthetic substances, and predicting new synthetic structures to predicting transmission routes and estimating the number of infected individuals, the application of AI models expanded to various fields. Everyone started leaping without looking. The snowball of resource scale rolled at a tremendous speed.

    In the second half of the year, discussions about the scale of computational resources to increase model training speed took place. It was different from the existing competition to secure deep learning computational resources for research goals. Scale gave rise to operational and optimization demands, and thanks to that, we were able to achieve the 'profitability from 2020' that we had anticipated in 2017. The demand for platforms increased, but at the same time, we were forced to work from home, and most communication became text-based. Although many people joined us later, some of them were destined to become colleagues who had never met each other in person until the workshop in early 2023.

    The scale of compute resources was growing, and all eyes were on GPUs, but as the models got bigger and the number of GPUs increased, the GPUs became less of a bottleneck. The biggest challenge was data storage. In training, you need to feed data to hundreds of machines. The absolute speed of storage hasn't kept up with the growth in the number of GPUs. Most of the problems we had to solve in 2020 came from storage bottlenecks.

    Another kind of change, slower but deeper than the deep learning race, has been happening: it's become second nature to people of all generations to treat online relationships as normal relationships. And then there comes a moment when you wonder, "Does it really make a difference to me if the person on the other end of the line is human or not, as long as they speak good enough?"

    • T-NLG, 17 billion parameters
    • GPT-3, 175 billion parameters
    • Gshard, 600 billion parameters


    The race for developing large language models that followed T5 and GPT-3 was growing in fascination. The simplest way to find out if performance improvement continues as the size increases is to make it even bigger. Various theories emerged as to why large language models produce peculiar results, but the answer was still unclear. A hypothesis emerged that when the state space is sufficiently large, a kind of phase transition occurs in the process of handling information. One of the candidate explanations for why the transformer structure handles these tasks well is that the transformer structure is a special case of graph neural networks (GNNs).[16] Graph neural networks, which gained attention since 2018, are neural networks that learn the relationships of objects and are known to be very powerful in processing semantics or taxonomies.

    Microsoft's DeepSpeed framework[17], which is most commonly used for distributed model training, began to be widely used. DeepSpeed's ZeRo optimizer focuses on reducing GPU memory usage by distributing workloads across various hardware from CPUs to GPUs and partitioning model states. Open-source language models also emerged. OpenAI was no longer releasing models and was selling exclusive rights to use models. Due to the reduced accessibility, various language models appeared, but they could not meet the high expectations as they did not match the scale of large language models.

    The scale of GPUs handled by users easily began to exceed triple digits. Various massive-scale tests tailored to the actual workloads run by institutions became necessary. We started running language models again for system testing purposes, which we had sent to the realm of hobbies at the end of 2017. The largest Portuguese language model in the world was born on our platform and was briefly introduced in passing during the keynote at the NVIDIA GTC conference. In the same conference, a tutorial session titled "Fine-tuning BERT in 60 seconds" was held. BERT was no longer a massive model but a subject of practice.

    As the model size rapidly grew, the problems to be solved also changed. When models had to be split and loaded onto multiple GPUs, communication between GPUs became extremely important. GPUs not only shared memory access within a node but also increasingly communicated across multiple nodes. It became common to attach one InfiniBand, which transmits 200 Gb per second, to each GPU to create a GPU network.

    Amidst the complex and hectic changes, a thought occurred to me. The process of large language models learning 'language' is based on unclassified corpora. What does the large language model 'learn' in that process? Although corpora are used for the purpose of learning the structure of language, language cannot be separated from information. In fact, don't language models that have not been explicitly taught knowledge readily answer questions? Language itself is a protocol for humans to convey information to each other. The conversation process involves computing answers to data transmitted through the protocol and responding with data again. Then, is our perception of having developed an 'AI that converses well' really about developing an AI model that creates language well, or have we created something beyond that?

    The following year was going to be the first year of services that are only possible with AI, not services that have been improved with AI. However, no one was yet thinking about providing the outputs of large language models as services. That was a task for someone in the future.

    • GPT-J, 6 billion parameters
    • LaMDA, 160 billion parameters
    • PanGU-alpha, 200 billion parameters
    • Gopher, 280 billion parameters
    • Pathways, 530 billion parameters
    • Switch-C, 1.6 trillion parameters
    • Wudao 2, 1.75 trillion parameters


    The COVID-19 endemic was creating tremendous aftereffects. Numerous IT companies that had grown due to the special circumstances of the coronavirus and many companies that had tried to shift their offline operations online were dumbfounded by the demand for the metaverse, which suddenly disappeared like a mirage. The field of deep learning had not yet generated significant revenue sources. Numerous companies began downsizing their AI teams. Many researchers came out.

    It wasn't that there was no need for technological advancements in AI. The massive scale underlying AI development overwhelmed all other advancements. In the era of big science, equipment was the most expensive one, just as it had been. It was the result of three years since innovation started coming from scale. The singularity occurring in large language models began to be regarded as a kind of emergence.[18] Small-scale studies were no longer attractive. Deep learning researchers were anxious. It wasn't the diminished interest that was the problem. A spoonful of mild despair over what research could be done with just a few GPUs was more of an issue.

    Nevertheless, there were several innovations that emerged from the beginning of the year. In addition to training with well-defined data, there was a model tuning method where humans actually evaluate the answers and assign higher weights to better answers. The RLHF (Reinforcement Learning by Human Feedback) method, which applied reinforcement learning to language model training by involving humans in the middle, showed tremendous improvement in the performance of language models of the same size in InstructGPT in 2022. Numerous models began to apply RLHF. If there were scaling laws for model size, there would be no reason not to apply them. In March, µ-Parametrization,[19] which could tremendously reduce the cost of model training, was introduced. The conclusion of the study, which showed that it was possible to predict the hyperparameters of a large model in advance using a small model, relatively reduced the effort of parameter search when creating massive models. This research became the basis for GPT-4 training.

    As a result of the U.S.-China trade conflict, the United States banned NVIDIA's export of AI training GPUs to China. Within a few days, China announced a large language model trained solely with its own semiconductors[20]. Soon after, NVIDIA slightly changed the name of the same GPU with its GPU networking features removed and resumed exporting. The interest in the AI service sector continued to grow due to models like DALL-E2 and Stable Diffusion, and the market for generative AI models, such as images, began to fluctuate.

    In late November, OpenAI opened a chatbot service to the public. It was based on GPT-3.5, an improved version of GPT-3. An interesting point was that instead of creating a language model that excels at programming by training programming code on a human language model, it was created by training human language on a model trained with programming language data. It became clear how training with programming code influenced the logical structure training of large language models. In early December, the service named ChatGPT[21] sparked interest in large language models based on its tremendous accessibility open to the entire public.

    Towards the end of the year, my acquaintances in the AI field who seemed to be on the verge of losing their jobs were bewildered by the support that suddenly improved in real-time. The movement of companies that had been downsizing their AI organizations, riding the wave of endemic layoffs, came to a halt. Leaders who had been pressuring to downsize research organizations and evaluate outcomes just a week before were now shouting for AI. The requirements for model service frameworks suddenly began to change. The goal of large language models became commercialization. Models had become so large that there was no longer any meaning in distinguishing between computational resources for training and services. AI model training and services, which originally belonged to different domains, suddenly merged into one.

    Bigger problems await. Large language models consume an enormous amount of power. GPUs consume a tremendous amount of power. Although their power-to-performance ratio is tremendously better compared to CPUs, their absolute power consumption is too high. A node with 8 NVIDIA A100 GPUs[22] consumes about 7 kW, and a node with 8 H100 GPUs, the highest-performing GPUs as of 2023, consumes about 12 kW[23]. Since 2019, it has become no joke to say that you have to construct a building first to install the equipment. After experiencing power issues at a supercomputing cluster located in Brazil in 2021, we ported the entire platform to Arm-based systems. It was in anticipation of power issues becoming a problem a few years later. In the case of Microsoft, they shared their experience of building a GPU center right next to a hydroelectric power plant, taking into account the electricity costs[24].

    Weekends diminished. There was too much to do. There was no time. It wasn't just us.

    Now, no one had time.

    • Flan-T5, 110 billion parameters
    • GLM-130B, 130 billion parameters
    • OPT-175B, 175 billion parameters
    • BLOOM, 176 billion parameters
    • PaLM, 540 billion parameters


    On February 8th, Microsoft and Google made announcements about their large language model-based services at a 17-hour interval. Microsoft announced plans to introduce GPT models into its search engine Bing, its Office suite, and Windows 11. Google introduced Bard based on LaMDA. Baidu released Ernie Bot. The two companies promoted the future rather than tangible services. Tools that couldn't be tried out relatively failed to attract interest.

    The "era of AI price competitiveness" that I thought would come someday had arrived. However, the initial expense itself was too high. Services like ChatGPT and Bard consume service costs that are too expensive to be explained by economic logic.[25] It corresponds to a future that has been brought forward too quickly by competition. It was after everyone had firsthand experience of that future. The problem was that expectations had skyrocketed.

    The suddenly approaching large language model services are creating another bottleneck. Services that perform inference based on CPUs have been affected by the significant reduction in memory bandwidth per CPU core. This is because the number of cores per CPU has rapidly increased. Services that perform inference based on GPUs lack both the capacity and speed of GPU memory where models are loaded. The bottleneck of memory, which was expected to come someday, suddenly became a direct problem due to the commercialization of large language model services. It was an anticipated bottleneck since 2021. CPU and GPU developers such as Intel, AMD, and NVIDIA had prepared for this situation in advance. From the end of 2022 to early 2023, they introduced various hardware such as Intel's Xeon Max, AMD's MI200, and NVIDIA GraceHopper.

    If an AI model is extremely large, computational power becomes relatively less important. When NVIDIA A100 was first announced, it was released with a 40 GB model, but a year later, they released an 80 GB memory model. Whether it was the training process or the inference process, the size was too large to load and unload models from memory. Additionally, the process of "inferring" large language models brought about a reversal in thinking about GPUs and NPUs. Unlike the training process, which requires constantly updating weight matrices, the inference process operates by flowing input data through a fixed model structure loaded into memory and observing the results. Therefore, the proportion of computation is tremendously reduced, and the speed of memory becomes extremely important. In the second half of 2022, NVIDIA announced the H100 with 80 GB of memory capacity. However, less than half a year later, when hardly anyone had received the actual H100, they introduced the H100 NVL with 188 GB capacity.[26]

    Meta introduced LLaMA[27], a language model that could be run on personal servers with some effort. Despite being attached with all sorts of license restrictions, LLaMA spread through illegally leaked copies, and Alpaca-LLaMA, fine-tuned by Stanford, showed the possibility of achieving significant performance even with (relatively) small models. Since then, various language models without license issues have been continuously released[28], fueling the potential of open language models while raising new questions about what size of parameters would be satisfactory. If the model is small, emergence are not discovered, and it cannot be used as a multi-modal model. If the model is large, it costs too much money to actually operate.

    How large can large language models grow? Signs of preparation for even larger models can be seen everywhere. Microsoft's DeepSpeed framework, which is most commonly used for distributed model training, added ZeRO Infinity[29] in 2021, an extension that can train models with 1 trillion to 10 trillion parameters by utilizing NVMe SSDs. However, models with such a large number of parameters are practically impossible to serve. In practice, the approach is to set a limit on the model size that can be served and fine-tune within that range. Technologies like ZeRO were developed to train ultra-large-scale models, but they are being widely applied as they enable fine-tuning with very few resources.

    • PaLM-e, 560 billion parameters
    • Pythia, 12 billion parameters
    • LLaMA, 6.5 billion parameters

    And numerous other models with ~12 billion parameters

    Various attempts are being made on numerous 12-120 billion parameter models that are 'good at talking.' LLaMA unintentionally spread foundation models that individuals could experience. Many people realized that the level of "models that are good at talking" that can satisfy ordinary people had been achieved long ago. Individuals or organizations with some computer knowledge and the ability to spend money have gained the courage to attempt fine-tuning language models in various ways.

    At the same time, it is becoming known that the computational resource requirements of models beyond just being good at talking are on a different level. For about half a year, the size of newly emerging large language models has been maintained at less than 600 billion parameters. It could be that further size expansion does not yield enough results, or it could be a technical barrier created by the current hardware and costs. Or, it could be a movement to keep the size below a certain level because that size is in a range where commercialization is impossible.

    Backend.AI, which started as an open-source project with 4 GPUs in 2015, handles several thousand GPUs in 2023 and will soon reach ten thousand. All environments, including ours, have changed tremendously. The more you dig into problems, the more problems keep emerging like potato stems. While living and solving numerous problems entangled with the size of large language models, I sometimes wonder where the end of this problem will reach.

    On nights full of thoughts, I sometimes think that, like the Turing test that unknowingly drifted away from our attention, we may have all passed a certain point without realizing it. It seems like we have either solved a problem that needed to be solved or solved a problem that shouldn't have been solved yet. Complex emotions of excitement turning into dizziness and expectations turning into depression come and go.

    • [^Editor's note] Crossroads is a science web journal launched by the Asia Pacific Center for Theoretical Physics, aimed at 'Science, Future, and Humanity' to showcase a scientific vision of the future through various genres of scientific writing, including science features, essays, columns and novels.
    • [1] Facebook Post
    • [2] There are various parameters besides the connections between neurons, but for convenience, it has been tremendously simplified as the model size becomes relatively small.
    • [3] Physical computers in their raw form, not virtual machines. In the cloud, it is common to run virtual machines on top of a hypervisor or manage them based on containers to reduce management overhead and flexibly manage resources. Due to cost issues, it has not yet been popularized in small research institutes and universities.
    • [4] Job Scheduler. Software that helps manage and execute processes. Slurm and others are commonly used.
    • [5] (2015). Since 2020, OpenAI has not released implementations, and since 2023, they have only provided tech reports instead of papers. As of 2023, there are various opinions on whether OpenAI is still an AI development organization that pursues openness.
    • [6], Google (2015)
    • [7] "AlphaGo - The Movie" For those who couldn't feel the atmosphere at the time, refer to the documentary (2018)
    • [8] J. Shin "Creating AI chatbot with Python 3 and TensorFlow" PyCon APAC 2016 (Korean) / (English) (2016) Although there are various presentation videos on the same topic as I had the opportunity to introduce it in several countries, these two are the first presentations.
    • [9] J. Shin "Android Dreaming of Electric Sheep: Implementing Chatbot Emotion Model Using Python, NLTK, and TensorFlow" PyCon KR 2017 (2017)
    • [10] A record of an interview at the Google Startup Campus remains on YouTube.
    • [11] "Transformer (machine learning model)"
    • [12] J. Devlin et al., "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding" Arxiv:1810.04805 (2018)
    • [13] Y. Liu et al., "RoBERTa: A Robustly Optimized BERT Pretraining Approach" Arxiv:1907.11692 (2019)
    • [14] A. Radford et al., "Language Models are Unsupervised Multitask Learners", (2019)
    • [15] J. Kaplan et al., "Scaling laws for neural language mod- els" Arxiv:2001.08361 (2020)
    • [16] C K. Joshi, "Transformers are Graph Neural Networks", The Gradient (2020)
    • [17] Microsoft, "DeepSpeed: Extreme Speed and Scale for DL Training and Inference", (2019)
    • [18] J. Wei et al., "Emergent abilities of large language models" Arxiv:2206.07682 (2022)
    • [19] E.Hu, G. Yang, J.Gao, "Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer" Arxiv:203.03466 (2022)
    • [20] A Zeng et al., "GLM-130B: An Open Bilingual Pre-trained Model" Arxiv:2210.02414 (2022)
    • [21] OpenAI, "Introducing ChatGPT" (2023)
    • [22] If a computer installed in a data center cabinet called a rack is considered a node, a single node with 8 A100 GPUs typically occupies 6 to 8 slots in a rack, and a rack can accommodate around 40 nodes.
    • [23] The power of a single floor in a typical university building is around 100 kW.
    • [24] "NVIDIA Teams With Microsoft to Build Massive Cloud AI Computer" (2022)
    • [25] According to my personal estimate, in the case of ChatGPT, the cost based on GPT-3.5 is over $42 per month. Refer to the link for the calculation process. Facebook Post
    • [26] "NVIDIA H100 NVL for High-End AI Inference Launched" (2023)
    • [27] H Touvron et al., "LLaMA: Open and Efficient Foundation Language Models" Arxiv:2302.13971 (2023)
    • [28] Representative examples include Dolly 2 (2023), which combines EleutherAI's Pythia-12B model with its own data.
    • [29] S. Rajbhandari et al., "ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme-Scale Deep Learning", Arxiv:2104.07857 (2021) To load a model with 1 trillion parameters onto a GPU without memory offload for training, 320 NVIDIA A100 GPU (80 GB) models are required.

    25 March 2024

  • 2023 Winter Intern in Lablup

    By Byeongjo Kim


    I applied to the Open Source Contribution Academy (hereinafter referred to as the Contribution Academy) hosted by OpenUp, and worked as a fall intern at Lablup Inc. (hereinafter referred to as Lablup) from November to December for 8 weeks. Afterwards, I extended it for an additional 8 weeks from January to February, working a total of 16 weeks.

    After being discharged from the military, I have written about the experiences I had while working at Lablup, my first company as a developer.

    Motivation for Applying

    Even before the Contribution Academy, I was interested in Lablup, and coincidentally, I had an opportunity to contribute through the Contribution Academy.

    During the Contribution Academy period, I worked on resolving issues and refactoring the webui of Backend.AI.

    While participating in the Contribution Academy, I felt a lot of affection, interest, and enjoyment towards Backend.AI, and I began to think that I wanted to continue contributing after the program ended.

    Lablup happened to provide an opportunity to work there in conjunction with the Contribution Academy, so I applied without hesitation.


    For the first 3 weeks of the internship, I underwent an onboarding process.

    I went through implementing a RealTime Chat, setting up the Backend.AI environment, and then the Pebble Seminar in that order.

    RealTime Chat

    This was the first assignment to become familiar with the core side of Backend.AI's code. I implemented a real-time chat app using Python, utilizing the aiohttp, aioredis, and asyncio libraries.

    Since there was no condition to store the chat contents, I used the in-memory database redis.

    I made it so that when a user enters a chat room, they subscribe to that chat room, and when a user enters a message, it publishes the entered message to the subscribed users, meaning the other users in the same chat room.

    RealTime Chat in action

    While I was able to handle Python at a basic level from preparing for coding tests, I had no experience using libraries like aiohttp, asyncio, and aioredis, so it took me some time to understand and grasp the concepts.

    However, this assignment helped me a lot in understanding the core side of Backend.AI's code, and it was good to be able to study new libraries.

    Setting up the Backend.AI environment

    Since I had already installed Backend.AI during the Contribution Academy, setting up the environment during the internship period wasn't too difficult.

    However, I was well aware that installing Backend.AI is not easy, as I had encountered many errors and failures while trying to install it during the Contribution Academy, and the other person doing the internship with me also had a lot of difficulties during the installation process.

    Since I had already experienced those failures and knew the solutions, I was able to help, and we were able to install it quickly and move on to other tasks.

    While setting up the environment, I also configured a virtual machine and VPN, and set up the environment on a virtual machine as well, so that I could work even if there were problems on my local machine. After setting up the configuration on the virtual machine, I mainly used the local for development during the subsequent work, and the virtual machine as a test server. The company's VM Farm, which allows for easy management and configuration of virtual machines, made it great for setting up development and testing environments.

    Pebble Seminar

    After completing the RealTime Chat and setting up the Backend.AI environment, I prepared a short seminar based on understanding the structure and code of Backend.AI. I was tasked with presenting on GraphQL and Relay, which are used in the Backend.AI WebUI.

    While I had experience with GraphQL, I felt that my knowledge was lacking for presenting in front of others, and Relay was a new library to me, so I was quite worried about preparing for the Pebble Seminar and read through many documents to prepare. First, I familiarized myself with the concepts by reading the official documentation for GraphQL and Relay, and then analyzed the Backend.AI code one by one to understand how they were applied and functioning in Backend.AI.

    Pebble Seminar preparation materials

    By analyzing the code while preparing for the Pebble Seminar, I naturally came to understand the code running in the WebUI, and this greatly helped me in finding and resolving issues during the subsequent work.

    Resolving Backend.AI issues and implementing features

    After completing the onboarding, I finally joined the frontend team and started resolving Backend.AI issues and implementing features. I had a coffee chat with the frontend lead to define the categories of work for this internship period:

    1. Creating a Table Column Setting component
    2. Researching E2E Testing
    3. Daily tasks

    During the 8-week internship period from November to December, I created a total of 19 Pull Requests, of which 18 were merged, and 1 is still under review. Since I had experience finding and assigning issues during the Contribution Academy, I had less difficulty with it, and because I enjoyed resolving issues, I was able to resolve more issues than others.

    Feature Addition PRs

    1. Implementing Table Columns Setting

    This was one of the issues I aimed to work on during the internship period. It was the only component that I conceived and implemented from scratch during the fall internship, rather than refactoring an existing component. Before implementing this feature, I thought it was a simple task that I could finish quickly, but things turned out differently from my expectations.

    First, I realized that I had been thinking too simplistically about creating new components. Even though I had designed and considered the props to be received before creating components in the past, through this issue, I felt that when creating a new component, I should invest more time and effort while considering scalability. I also realized that I should pay more attention to how other sites are designed and what features are applied.

    Table Columns Setting

    2. Adding service endpoint and owner columns to the Model Serving page table

    Previously, when creating a model service, users had to go to the detail page to check the endpoint, which is a frequently used feature. So there was a request to add the endpoint to the table column. Additionally, since the admin account can see services of users in the same group, there was a suggestion to have a column showing the service owner. Since the GraphQL fields for retrieving this data were already implemented, I added the fields to the query to fetch the endpoint and service owner data, and then added columns to the table to display the data. The owner column is only shown for admin accounts.

    Implementation view. Screen for admin account (left) and user account (right)

    3. Disabling log button for sessions in CANCELLED state

    The CANCELLED state means that the container has never been created or failed to be created. Previously, the log button was enabled even for sessions in the CANCELLED state, and if a user clicked the log button, the agent could not find the container information, resulting in a 500 error. In this PR, I made it so that the log button is disabled for sessions in the CANCELLED state, preventing users from clicking it.

    Session in TERMINATED state (session 1) and CANCELLED state (session 2)

    4. Testing and creating a custom hook for dark mode

    Before implementing dark mode, I found components with hardcoded colors and implemented a custom hook named useThemeMode for applying dark mode. When creating the custom hook, I tried to use the useLocalStorageState hook from ahooks, but contrary to my expectations that it would automatically manage states with the same key value, I found that they operated independently. To handle states with the same key value automatically updating when the value changes, I added a custom hook named useLocalStorageGlobalState, and then used that to create the useThemeMode custom hook for setting the dark mode.

    Bug fix PR

    1. Allowing signup without invitation token

    In the config.toml, when the allowSignupWithoutConfirmation option is set to true, users can sign up without an invitation token. However, when a user clicked the sign up button, an error occurred because the token value was undefined. In this PR, I modified it so that if allowSignupWithoutConfirmation is true, the token variable is not used. Additionally, previously users could modify other input values while the core was processing the data after clicking the sign up button, and the previous data remained when the dialog was closed and reopened. In this PR, I made it so that other input values cannot be entered while data is being processed, and the previous input values are cleared when the dialog is closed.

    2. Displaying the correct screen for the selected sub-tab on the user management page

    On the user management page, there are sub-tabs for displaying active users and deactivated users. However, if a user navigated to another page and then returned to the user management page, even though the sub-tab was set to inactive, the screen displayed the list of active users, causing confusion. This PR resolved that issue by remembering the current sub-tab when navigating to another page, and displaying the appropriate screen for that sub-tab when returning to the user management page.

    Before (left) and after (right) the fix

    Extending the internship

    As I resolved issues, the 8-week period flew by, and it was time to wrap up the fall internship.

    Working at Lablup as my first job after being discharged from the military was an important period for me. During the internship at Lablup, I was able to experience my strengths and weaknesses, what skills I needed to further prepare, and how other developers work. The 2-month period felt very short, and since I had enjoyed working so much during that time, I wanted to continue working. So I expressed my desire to extend the internship to the lead, and we agreed to extend it for another 8 weeks until February. During the fall internship, I had thought a lot about my weaknesses, but I couldn't find my strengths. So, I started with these 3 personal goals:

    1. Find my strengths during this period
    2. Read the documentation whenever I have time
    3. Work even harder, leaving no regrets

    Resolving issues and implementing features during the extended period

    The work during the extended period did not differ much from before. Without the onboarding process and installation, I could focus more on resolving issues.

    Feature Addition PRs

    1. Refactoring ErrorLogList

    I refactored the ErrorLog List, which was previously implemented using Lit elements, to React. This feature was the most satisfying issue for me, as I personally use it frequently after the refactoring.

    Before (left) and after (right) refactoring

    During the refactoring, new Search and Error filter features were added.

    Added Search feature (left) and Filter feature (right)

    2. Modal drag functionality

    I used the React-draggable library to add functionality for modals to be dragged. By adding the Draggable prop to a modal, it can be applied to any modal that requires dragging.

    Draggable Modal

    By clicking the icon on the left side of the modal title and moving the mouse, the modal can be moved to the desired position on the screen.

    Currently, it is applied to the modal for viewing user information on the user management page and the modal for changing user settings, where you can check it.

    While it is not being used in many places yet, I think this PR will be useful as more components and features are added.

    Bug fix PR

    1. Modifying Vfolder invitation permissions

    There was an issue where the user permissions for group vfolders were not being updated. When trying to modify the permissions, the items were not displayed or selectable properly in the select. Previously, the items were being displayed using option tags, but I changed it to use mwc-list-item to display the items and modified the overflow option to resolve this issue.

    Before (left) and after (right) the PR

    2. ResourceGroupSelect extending outside the card

    There was an issue where the ResourceGroupSelect value would be displayed outside the card if it was too large.

    Symptoms of the issue

    To resolve this issue, I set the max-width CSS on the Select component so that it cannot exceed the width of the card.

    Additionally, in this PR, I added a Search feature to the Select component, for which I used the useControllableValue hook from ahooks. The useControllableValue hook helps manage props from either the parent or itself. While it was a simple PR, it took me longer than expected since it was my first time using useControllableValue. I was able to resolve this issue with the help of the lead and another intern.

    3. Key pair list not showing when clicking the generate & manage key pair button on the summary page

    On the summary page, there are buttons for "Generate New Key Pair" and "Manage Key Pairs." However, when clicking these buttons, instead of showing the key pair list, it simply navigated to the user management page, displaying the user list.

    "Generate New Key Pair" and "Manage Key Pairs" buttons on the summary page

    When clicking the "Generate New Key Pair" button (left) and when clicking "Manage Key Pairs" (right)

    While this issue was not critical, I resolved it because I had experienced a lot of confusion when I first used Backend.AI and didn't fully understand the key pair feature.

    After resolving this issue, I could confirm that the key pair list was displayed on the screen as intended.

    After resolving the issue, when clicking the "Generate New Key Pair" button (left) and when clicking "Manage Key Pairs" (right)

    Completing the Internship

    Thanks to the Contribution Academy, which started from a friend's recommendation after being discharged from the military, I was able to contribute at Lablup for an extended period. Since I had no previous internship or project experience at other companies, this was a very important period for me as I was starting anew after being discharged. It was great that I could experience my strengths and weaknesses, the skills I lack, and the culture of an open-source company at Lablup. How many companies make you want to go to work every day with their horizontal structure, free atmosphere, pleasant working environment, and good equipment? Although I worked at Lablup for 4 months, I felt like I wanted to go to work every day, and if it was Lablup, I could work at a company while doing interesting and desirable work for a long time. Over the 4-month period, I also developed a fondness for Backend.AI, the service provided by Lablup, and I plan to attend the conference hosted by Lablup every year whenever possible to see its advancements and technologies.

    Lablup Office

    This post was also published on the author's personal blog. This post is automatically translated from Korean

    11 March 2024

  • Summer 2023 Internship Reviews

    By Dongjin Park


    I applied to CUop, a collaboration between universities specializing in science and technology, and worked as an intern at Lablup for 8 weeks.

    I wrote about my experiences through onboarding, developing Backend.AI, and attending PyCon.

    Motivation for applying

    I first learned about Lablup through a PyCon presentation session I stumbled upon. I could tell that the company had a lot of technically deep and passionate members. I applied to Lablup because I was interested in Python and asynchronous programming.


    During the first two weeks, we conducted an onboarding process.

    We went through the Realtime Web Chat implementation, Backend.AI development environment, and code base seminar.

    Realtime Web Chat

    This is an assignment to familiarize yourself with Python asyncio. You will develop a real-time chat app using aiohttp, an asynchronous web framework, and redis, an in-memory database. The task also includes setting up docker compose to build python and redis at once. For more information, see the Github Readme.

    To broadcast the messages through redis, we used Redis pub/sub. pub/sub acts as a platform that delivers messages without storing them. Since we didn't have a requirement to store messages, we used Redis pub/sub. We also registered the Redis pub/sub process as a task using asyncio.create_task() to run it in an event loop. Realtime Web Chat Launch Screen Realtime Web Chat Launch Screen

    I was able to understand the basic behavior of asyncio. I was able to ask questions and solve the difficult parts. I think Lablup has a great environment for interns and junior developers to grow because they can freely ask questions through Microsoft Teams.

    Build the Backend.AI development environment

    I installed Backend.AI on a VM Farm and a local VM and tried to run it myself. I read the official documentation, but the process was not smooth. I encountered various errors, which I solved by sharing with my fellow interns and asking questions on Teams.

    💡 A VM farm is an environment where virtual machines are managed and run. Lablup has its own VM Farm.

    This was my first experience of developing with a VM Farm and VSCode connected via SSH. To develop Backend.AI, I need to run multiple processes (Manager, Agent, Storage proxy, Web server, etc.) and Docker Containers. If I use a laptop, it's a bit of a battery drain. However, with VM Farm, you can develop Backend.AI lightly because all you need is an SSH connection. In fact, I was able to develop for a long time using VM Farm when I was out of the office and couldn't charge my laptop.

    Code Base Seminar

    After looking at the code, focusing on the difficult parts of Backend.AI, I prepared a seminar based on my understanding. I was in charge of presenting the Manager, Agent, and GraphQL parts.

    Since Backend.AI is open source, the official documentation is well written. I studied the architecture of Backend.AI by looking at the official documentation to understand the overall structure and asking the employees directly if I had any questions. Since the topic of the presentation was session/kernel creation and scheduling control of Backend.AI Manager, I studied the manager code and analyzed the logs of the manager process. A sequence diagram I drew in preparation for a seminar presentation. A sequence diagram I drew in preparation for a seminar presentation.

    Analyzing the logs, I found a bug that caused the session state to change from Preparing back to Pulling. It was rewarding to analyze the logs one by one. It was difficult to analyze the logs in order because of the asynchronous code base, but drawing a call graph and a sequence diagram was very helpful.

    Develop Backend.AI

    After onboarding, I started working on Backend.AI. I worked on GitHub issues and volunteered to help or found and resolved issues myself.

    I created 9 pull requests from the Backend.AI repository repository and 2 pull requests from the Backend.AI-WebUI repository, and they all merged!

    I chose high-priority issues that I was confident in addressing. I wanted to make as many contributions as I could in the short time frame of two months.

    First PR

    I created a PR to fix a bug I found while preparing for a seminar. It was an easy PR to fix the API parameter. However, it was a good experience to learn about branch name convention, commit convention, and news fragment, and to experience the CI (Continuous Integration) process with GitHub Actions and to do some Git-related shoveling beforehand.

    💡 A news fragment is a one-sentence Markdown description of what the branch created by the PR is trying to do. You want to keep it simple and clear so that if you see this PR again in the future, you'll know what it was trying to do.

    PRs for vfolder

    When I heard that Teams had an open issue for an intern, I jumped at the chance to apply. I had to learn a new concept called vfolder, but I knew it would be important to get to know the product.

    PR (1)

    Only admin can create a project type vfolder. It should be possible to create a vfolder regardless of the max_vfolder_count of the keypair resource policy, but if the user type vfolder exceeds the max_vfolder_count, the project type vfolder cannot be created. At first, I was confused by the terminology, but by analyzing the code and asking questions, I was able to interpret the terminology.

    PR (2)

    Fixed new bugs discovered while addressing PR (1).

    PR (3)

    I found an issue related to the PR (1) issue. DB migration and GraphQL were new to me, but I wanted to try them out, so I supported them. I used a DB migration tool called Alembic, studied GraphQL schema concepts, and modified my query and mutation code to support backward compatibility. I tried to use cURL to test the modified code, but GraphQL has a much longer request format than the REST API, which was cumbersome. I wrote the test code by asking questions to interns and employees who are familiar with GraphQL. I wrote python code to test the modified query and mutation in the form of CLI to make testing easier.

    WSProxy related PRs

    Supported an issue in Teams. There was a bug in the WebUI where it was not possible to delete a session if the wsproxy address of the resource group was invalid. I also wanted to get some experience with WebUI development.

    PR (1)

    I read through the WebUI code to troubleshoot the issue, but I couldn't quite grasp the concept of wsproxy. I realized that wsproxy has v1 and v2, but it was not easy to understand the difference between the two, so I asked the employees. The main difference between v1 and v2 is the path of the traffic: v1 goes through the manager to communicate with the container, while v2 can communicate directly with the container without going through the manager, which is faster. Once I understood what wsproxy does and the difference between v1 and v2, it was easier to understand how the code worked, and I realized that a lot of people didn't know the difference. I realized that questions that seemed easy to ask might not have been asked in the organization.

    PR (2)

    I also modified the webui code to fix the issue. To fix the JavaScript code, I learned about callback functions, promise objects, and async/await. I handled errors so that they didn't affect other logic, and defined duplicate code as functions to eliminate duplication of code.

    PR (3)

    However, since the WebUI needs to be backwards compatible with Backend.AI 22.09, we had a review from our CEO that it should also handle HTTP Status 404, so we made it handle both 404 and 500.

    PR (4)

    However, after the code was merged, a bug occurred: when setting up v1 wsproxy, the return value for wsproxy-version disappeared. This happened because I was modifying the core code and didn't handle all the branches. I fixed the code in a hurry, but it was a simple mistake, and I realized that I should write test code to prevent such mistakes.

    PRs related to Manager

    In preparation for the seminar, I came across an issue with manager that I had been studying. With my internship coming to an end, I thought I could contribute by resolving the issue on the code I knew best.

    This PR has been heavily modified by code reviews. Initially, we designed scheduler health check and scheduler trigger as the same API. After receiving code reviews, I split the two functions into different APIs to separate responsibilities. We originally stored health information only for the schedule function, but we also stored health information for the prepare function and the scale_services function because we needed to create the trigger API after storing health information for the three functions that run periodically according to the scheduler's global timer to get a complete picture of the scheduler's health. We also changed the design so that we can store the scheduler's health based on the manager ID because there may be multiple manager processes.

    The scheduler's storage was also reviewed. Initially, we looked at the code in the existing manager state API to store manager state in etcd and did the same for scheduler state. etcd is good for storing configuration information that needs to be kept consistent, but it is slow to write. redis is a volatile database, but it performs well even with many reads/writes. Since the scheduler state is read/written periodically and does not need to be consistent, we switched to storing it in redis.

    Agent-related PRs

    Now that I had a good understanding of the Manager part of Backend.AI, I wanted to understand another important component: the Agent. I came across an issue about the Agent, so I took a look.

    While Backend.AI was running, there was a bug where the internal state of the Agent did not match the state of the actual working container. As a result, when creating a session, an InsufficientResource Error was thrown during the resource allocation process, even though there were actually enough resources. When the error occurred, we needed to improve the logging to understand what went wrong during the resource allocation process.

    It took a while to figure out the resource allocation process. The concurrency issues were difficult, and it took a lot of Q&A with the CTO to get a general idea of the flow and what to log.

    A few weeks after my internship ended, the CTO merged it with more than 10 commits, adding refactorings and test code. What impressed me was that he wrote test code to reproduce the error. I had to go through a complex process (see PR) to reproduce the error myself, which took a lot of development time. I could see the difference in productivity here. Of course, I thought about writing test code, but I thought the implementation would be too complicated, and I thought that writing test code would be the end of my internship. In the future, I shouldn't be so intimidated by writing test code and just try it out and learn as I go. Also, it seems that the refactorings were done with a focus on code readability. The functions in the part I modified were too long and not very readable, but after refactoring, the functions were shorter and the logging was cleaner. I realized that I shouldn't stop at implementation, but try to make good code.

    Attending PyCon

    On August 12 and 13, Lablup will have a booth at PyCon. Companies that sponsor PyCon are given the opportunity to have a booth. I was an intern, but I wanted to participate in the booth activities and listen to some of the talks. The company had some PyCon tickets left over, so I was able to participate.

    At the Lablup booth, we had an event where we challenged Llama2 to write a 10-line pyramid at a prompt. The challenge wasn't that easy, and it was important to explain it in a way that Llama2 could understand. Two lucky people who submitted the correct answer were drawn to win a Nintendo Switch and a Logitech mouse. My role at the booth was to help direct PyCon attendees to the event, and since there were a lot of employees at PyCon, I was free to attend any talks I wanted to hear. Since Rableup is an open source company, we encourage people to contribute to open source and participate in conferences. In fact, four of our presenters attended PyCon, so we value participating in conferences.

    Lablup Inc. booth Lablup Inc. booth

    During the RustPython session, a tool called ruff was introduced as a replacement for the python lint tools flake8 and isort. Since ruff is composed of Rust, it is 100x faster than flake8. At Backend.AI, we were using flake8 and isort for lint, but after our CTO reviewed ruff, I watched him adopt ruff for the Backend.AI project right on the stairs of COEX. I realized that he was really good at the process involved in coding, applying new tools to the project in a short time and even revising the official documentation. I realized that I want to be a proficient developer someday. After PyCon, I read the updated official documentation, applied ruff to the Backend.AI development environment, and experienced linting 100x faster. If I hadn't participated in PyCon, I wouldn't have gotten up to speed on these great tools so quickly. I hope to continue participating in developer conferences in the future. Group photo with members of Lablup Inc. Group photo with members of Lablup Inc.

    End of internship

    During my internship, I tried to get as much experience as possible, and I wanted to contribute a lot. In the end, I was able to experience a lot because I tried to contribute a lot. I was quick to volunteer for issues that came up in Teams, so I was able to understand the core components of Backend.AI: vfolder, wsproxy, web-ui, manager, and agent. I also learned new concepts like DB Migration, GraphQL, and etcd. Although it was a bit physically demanding to attend the conference from morning to evening on the weekend, it was fun to listen to more than 10 presentation sessions, get inspired, and meet various people through booth activities.

    During my internship, I think I actively asked questions about things I didn't understand, which helped me to solve issues quickly. I think the reason why I was able to ask a lot of questions was because there was a culture of horizontal rabble-rousing, and there were many people who were kind enough to answer my questions, so I was able to actively ask questions. I would like to take this opportunity to thank the members for their support.

    I was able to experience a variety of things, including asynchronous programming experience, GitHub collaboration, presenting English seminars, and attending conferences. I feel like I've grown a lot as a developer through this program. I recommend the Lablup internship to anyone who is thirsty for growth.

    This post is automatically translated from Korean

    22 November 2023

  • 2023 Lablup DevOps Summer Retrospect

    By Gyubong Lee

    In this post, I'll share my experience as a developer at Lablup over the past 9 months.

    Table of Contents

    • Motivation to apply
    • From Intern to DevOps!
    • rraft-py Development
    • Open Source Contribution Academy Regional Sprint Backend.AI Mentoring
    • Attending various conferences
    • 2023 Open Source Contribution Academy
    • Presenting at PyCon
    • Conclusion

    Motivation to apply

    Even before I joined Lablup, I knew that I wanted to have a career where I could continue to help others through the programs I develop, whether as a hobby or during work hours.

    Open source was particularly appealing to me because it meant that not only could my code help others, but that they could freely modify and utilize it if they wanted to.

    One of the things I realized after working on my own project, Arvis, for my graduation project, is that it's not really easy to keep a project going simply because it's something I love to do, as it keeps growing in size. I tried to plan and execute the project carefully from the beginning, but in the end, I realized that I underestimated the time and effort required to maintain the project.

    In that regard, Lablup, which actively encourages and supports open source-related activities and even develops core parts of its source code as open source, was the company of my dreams.

    From Intern to DevOps!

    The last three weeks of My internship at OSSCA Lablupwere spent studying and researching distributed systems, specifically implementing the Raft algorithm. Although my job title changed from intern to DevOps, I still felt like I was expanding on my internship learning, including Raft, to solve issues I worked on during my internship.

    I've been involved in a variety of other activities that I'll mention below, but my main work at the company to date has been writing a Python binding of the Raft algorithm implementation to replace the existing distributed lock structure, including writing rraft-py, and thinking about how to integrate it with Backend.AI.

    rraft-py Development

    rraft-py is a Python binding implementation of tikv/raft-rs, and you can read more about it in the GitHub Readme / Wiki. I'll also be presenting some technical details on the topic in my PyCon 2023 KR talk next month, if you're interested.

    For now, I'm going to focus on my experience as a Lablup developer, leaving aside the technical details of what I learned while developing rraft-py.

    I had to think a lot about rraft-py because it was not just about fixing an issue in Backend.AI, but also about creating a separate project and integrating that project with Backend.AI.

    Overall, there were several mile stones in the project, and I feel like I was able to move forward with the project with a little more stability after each mile stone. There was definitely a high sense of accomplishment each time, but there were also many times when I was frustrated because I realized later that the code I had initially written didn't work the way I intended. But Lablup allowed me the time to do these shoveling sessions, and I think I've gotten to where I am today because of the things I've learned that I would have otherwise dismissed as "shoveling".

    Results of running the rraft-py example code

    There's still a long way to go to integrate rraft-py into Backend.AI, but the bottom line is that it's great to have the experience of thinking for yourself and making your own decisions as you continue to evolve your project, and for developers who like this kind of experience, Lablup could be one of the best options out there.

    Open Source Contribution Academy Regional Sprints Backend.AI Mentoring

    While rraft-py development was my main focus, as it required more time than I had anticipated, I also had the opportunity to work on a variety of other projects.

    One of the most memorable experiences was participating in the 1st Daegu Open Source Contribution Academy Regional Sprint as a Backend.AI mentor.

    In fact, I participated as a mentor without a deep understanding of Backend.AI, and to make matters worse, the sprint period was only 2 days, so I was worried about many things.

    In order to make sure that the mentees learn at least one thing and go home as satisfied as possible, I had to think about how to explain Backend.AI to those who don't know it at all, and how to build a development environment on different platforms (personally, I usually only develop on macOS + docker desktop environment, but some of the mentees were working on Windows environment, so I had to shovel while building the development environment). I had to think about a lot of things and prepare.

    In conclusion, I was able to learn a lot more than I thought because I was unfamiliar with these processes, and the mentees followed along better than I thought, so I think it was a meaningful time for everyone to create more than one PR.

    The 1st Daegu Open Source Contribution Academy Regional Sprint

    Participation in various conferences

    We had the opportunity to participate in various conferences and exhibitions such as AI Expo, AWS Summit, and Next Rise. It was great to learn how to explain Backend.AI to different types of people, and it was also interesting to see the different technologies of other companies.

    AI EXPO KOREA 2023

    2023 Open Source Contribution Academy

    As a company with an open source culture, Lablup actively participates in the Open Source Contribution Academy every year. This year, I also participated in the Open Source Contribution Academy, which encourages participation in various other projects besides the Backend.AI team, so I've been working on GlueSQL as a mentee.

    I think this culture of freedom is very attractive to developers with a strong desire to grow.

    (In addition to myself, there are two other people involved in other projects in the 2023 Contribution Academy).

    PyCon announcement

    Based on my experience in developing rraft-py at my company, I was also given the opportunity to present at 2023 PyCon KR.

    Personally, I'm a bit nervous because it's my first time presenting in public, but I'm doing my best to prepare. For those who are interested in presenting, I am looking forward to sharing not only the presentation materials but also the source code and work history through GitHub.


    Lablup is a company with a strong open source culture, encouraging participation in various open source and community-related events such as the Open Source Contribution Academy ( and PyCon, and giving developers the opportunity to take initiative in their work.

    I hope to continue to participate, learn, grow, and contribute to open source activities of various nature at Lablup.

    This post is automatically translated from Korean

    18 July 2023

  • Learning from history

    By Sanghyeon Seo

    Learning from history

    ChatGPT and other language giants that have gained traction in the last half-decade didn't just fall out of the sky out of nowhere. We've seen it many times throughout history, where a cumulative development of technology reaches an inflection point and rapidly transforms society. Sometimes, the path to that inflection point is strikingly similar for technologies that evolved in very different times and contexts.

    Mankind has long wanted to fly - it's no coincidence that Civilization 6 has "Dream of Flying" as its theme song.

    How did the Wright brothers realize their dream? In 1899, Wilbur Wright writes to the Smithsonian Institution and asks them to send him what is known about airplanes. After three months of reviewing the material he receives, Wilbur concludes that not much is known about flight, except that it's a problem. Plausible theories have turned out to be false and improbable theories have turned out to be true, he writes, so he can't believe anything he hasn't seen for himself.

    What Wilbur wanted to know from his literature review was this: what do we need to know about flying? What of it is already known? What are the remaining problems that need to be solved? Surprisingly, Wilbur was able to answer all of these questions from his literature review, something his competitors were unable to do.

    In a 1901 lecture, Wilbur summarized his conclusion: "There are three problems with flying. You need wings to make the airplane float, you need an engine to make the airplane go, and you need a way to control the airplane."[^1]

    [^1]: Some Aeronautical Experiments (Wilbur Wright, 1901).

    Wilbur saw that the wing problem and the engine problem had been solved to a certain extent, so he needed to solve the piloting problem. To solve the piloting problem, he needed an airplane, and to build an airplane, he needed to solve the piloting problem. Wilbur concluded that he could solve the problem of controlling an airplane by taking the problem of controlling a glider.

    To test the glider, he needed high hills and strong winds, and they had to be sand dunes for the safety of the experimenters. In 1900, Wilbur requested data from the Weather Bureau to review the windiest places in the United States. The staff at the Kitty Hawk Weather Station wrote back that the beach next to the station was unobstructed and would be suitable for the experiment."[^2]

    [^2]: Letter from J. J. Dosher, Weather Bureau, to Wilbur Wright, August 16, 1900.

    The 1901 experiment was a disappointment: the wing didn't have enough lift. The Wright brothers had used data from the Auto Lilienthal to calculate the area of the wing, and they had become suspicious of its accuracy.

    After analyzing their experimental data, they concluded that John Smeaton's value of the proportionality constant, which had been used without question for over 100 years, including by Otto Lilienthal, was incorrect.

    In order to systematically analyze the lift of wings without the time-consuming and laborious glider experiments, the Wright brothers built a wind tunnel. Their analysis showed that the data from the Auto Lilienthal was correct, except for the incorrect value of the proportionality constant, but the wing used by the Auto Lilienthal was inefficient.

    The 1902 glider that resulted from this analysis had a larger area to offset the revised value of Smithen's constant and a flatter shape to increase efficiency. (They changed the flatness of the wing from 1/12 to 1/24.) The new glider flew very well.

    That's how the Wright brothers were able to make their historic first flight at Kitty Hawk in 1903.

    Humans have long wanted to talk to machines - countless science fiction novels and movies bear witness.

    To create AI, OpenAI had to solve three problems. Computing infrastructure, models, and data. You can think of the computing infrastructure as the engine, the models as the wings, and the data as the controls.

    To manage the compute infrastructure, OpenAI used Kubernetes, but it wasn't something they could just grab and go. When they hit 2,500 nodes, they ran into problems with the Linux kernel ARP cache overflowing,[^3] and when they hit 7,500 nodes, they had to fix a bug in Kubernetes to enable anti-affinity.[^4]

    [^3]: Scaling Kubernetes to 2,500 nodes, January 18, 2018.

    [^4]: Scaling Kubernetes to 7,500 nodes, January 25, 2021.

    (Advertisement: Lablup's Backend.AI has already been used in practice on large clusters for AI training and inference, solving a number of scaling problems and implementing its own scheduling algorithms to support features like affinity and anti-affinity.)

    The scaling law for AI is the equivalent of the lift equation for an airplane. Just as the lift equation describes the lift of a wing as the area of the wing, the lift coefficient, and the proportionality constant, the scaling law describes the loss of an AI model as the size of the model, the size of the data, and the power law exponent.

    Just as the Wright brothers discovered that John Smeaton's constant of proportionality was 0.003, not 0.005, the power law exponent of scaling law was initially thought to be 0.73,[^5] but was actually found to be 0.50.[^6] The incorrect value was calculated because the learning rate was not adjusted for the size of the data.

    [^5]: Scaling Laws for Neural Language Models, January 23, 2020.

    [^6]: Training Compute-Optimal Large Language Models, March 29, 2022.

    OpenAI knew that control of the model was an important issue, so before we trained our first GPT, we were already working on reinforcement learning from human preferences,[^7] which we applied to the control of robots, reminiscent of the Wright brothers referencing bird flight for control of airplanes and first applying it to gliders instead of airplanes.

    [^7]: Deep reinforcement learning from human preferences, June 12, 2017.

    To apply this research to language models, human preference data was collected, resulting in InstructGPT.[^8] It's hard to know exactly because OpenAI hasn't published its research since GPT-4, but research is showing that it can learn not only from explicit feedback, but also from implicit feedback, such as users retrying to create or continuing and stopping conversations.[^9] If so, OpenAI could create a positive feedback loop where it improves its models to gather users, who then gather users to improve their models.

    [^8]: Training language models to follow instructions with human feedback, March 4, 2022.

    [^9]: Rewarding Chatbots for Real-World Engagement with Millions of Users, March 10, 2023.

    In this article, we've compared how humans went from flying to talking to machines, and we've seen some very similar patterns in the evolution of technology throughout history.

    What other examples will we see in the future as AI technology advances and we work to make it more accessible to more people? Can Rableup and Backend.AI help accelerate that process, allowing people to experiment and realize what we've learned from history more quickly? We're in the middle of this inflection point.

    This post is automatically translated from Korean

    12 July 2023

  • Developer Advocate in Lablup

    By Jongmin Kim

    The content of this post is based on my own personal experience, which is not necessarily true in all cases, and opinions may vary.

    What is DevRel?

    Image by Henning Westerkamp from Pixabay

    As the name implies, DevRel's primary purpose is to create relationships with developers. This includes not only the developers who write the code, but also the planners, designers, and all the production people involved in creating services and products. The specific role of a DevRel varies a lot depending on the nature of the organisation and the industry, but often the objectives converge to promote (product, technology) or recruit.

    The primary roles of a DevRel are to

    • Product promotion and technical evangelism
    • Build, run, and support the community
    • Gathering user input and feeding back into product development
    • Provide a variety of resources to help users better use the product

    Developer Advocate 은 DevRel 안에서도 기술 전파리소스 제공 에 조금 더 초점을 맞추고 있는 역할이라고 볼 수 있습니다.

    Developer Advocate can be seen as a role within DevRel that is a little more focused on skills evangelism and resource provision.

    Why DevRel?

    In order to make their products and services known to the public, companies conduct marketing and public relations activities through various channels and events. These activities are called PR - Public Relations, which literally means creating a relationship with the public. In companies that develop or service IT products, the main consumers are engineers (developers). And products aimed at engineers often need to provide more detailed and technical information, which requires different resources and strategies than general PR activities. DR - Developer Relations, as the name suggests.

    The community is arguably the most important driver of an IT ecosystem, especially for open source products like Backend.AI. Moreover, when new technologies of a similar kind come along, the size of the community is often a factor in choosing and adopting them, in addition to performance or maturity. One of the important roles of DevRel is to help and engage with these communities.

    A great analogy I like to use to explain the importance of community is the relationship between an artist and their fan club. An artist's primary role is to create music and perform, but it's through their fanbase that viral and secondary creations are born, and it's an important driver of their career and the entertainment business. Nowadays, entertainment companies are well aware of the importance of fandom and support various activities.

    Image created by Midjourney

    Similarly, the DevRel role in an IT organisation is to support and communicate with the various communities and ensure that the product and organisation don't lose focus.

    Developer Advocate in Lablup

    Lablup has been active in the Korean developer community for a long time by Wooyoung Yoo, and our CEO, Jeongkyu Shin, is also active in various developer communities, so we have already done a lot of DevRel activities for an organisation of our size. Backend.AI is a great piece of software, and it's an important tool, especially in this era of big AI. However, as a tool for AI learning and inference, it requires a lot of preparation and resources to set up, which can be daunting, especially if you're new to AI (which I am 😅).

    Since Backend.AI does not have a large user base yet, we have been conducting various community activities such as the [Open Source Contribution Academy] ( in conjunction with the theme of open source culture. In order for our own community to grow, information needs to be produced and disseminated among users in addition to the one-way transmission of technology from creators to users. In order for such activities to take place, it is my main task to continue to create and improve various resources and exchange opportunities that anyone can easily use. I will continue to strive to communicate with you in various opportunities and ways.

    This post is automatically translated from Korean

    24 March 2023

We're here for you!

Complete the form and we'll be in touch soon

Contact Us

Headquarter & HPC Lab

Namyoung Bldg. 4F/5F, 34, Seolleung-ro 100-gil, Gangnam-gu, Seoul, Republic of Korea

© Lablup Inc. All rights reserved.