AI Data Infrastructure

AI Data Infrastructure — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Color normalization

    Color normalization

    Color normalization is a topic in computer vision concerned with artificial color vision and object recognition. In general, the distribution of color values in an image depends on the illumination, which may vary depending on lighting conditions, cameras, and other factors. Color normalization allows for object recognition techniques based on color to compensate for these variations. == Main concepts == === Color constancy === Color constancy is a feature of the human internal model of perception, which provides humans with the ability to assign a relatively constant color to objects even under different illumination conditions. This is helpful for object recognition as well as identification of light sources in an environment. For example, humans see an object approximately as the same color when the sun is bright or when the sun is dim. === Applications === Color normalization has been used for object recognition on color images in the field of robotics, bioinformatics and general artificial intelligence, when it is important to remove all intensity values from the image while preserving color values. One example is in case of a scene shot by a surveillance camera over the day, where it is important to remove shadows or lighting changes on same color pixels and recognize the people that passed. Another example is automated screening tools used for the detection of diabetic retinopathy as well as molecular diagnosis of cancer states, where it is important to include color information during classification. == Known issues == The main issue about certain applications of color normalization is that the result looks unnatural or too distant from the original colors. In cases where there is a subtle variation between important aspects, this can be problematic. More specifically, the side effect can be that pixels become divergent and not reflect the actual color value of the image. A way of combating this issue is to use color normalization in combination with thresholding to correctly and consistently segment a colored image. == Transformations and algorithms == There is a vast array of different transformations and algorithms for achieving color normalization and a limited list is presented here. The performance of an algorithm is dependent on the task and one algorithm which performs better than another in one task might perform worse in another (no free lunch theorem). Additionally, the choice of the algorithm depends on the preferences of the user for the end-result, e.g. they may want a more natural-looking color image. === Grey world === The grey world normalization makes the assumption that changes in the lighting spectrum can be modelled by three constant factors applied to the red, green and blue channels of color. More specifically, a change in illuminated color can be modelled as a scaling α, β and γ in the R, G and B color channels and as such the grey world algorithm is invariant to illumination color variations. Therefore, a constancy solution can be achieved by dividing each color channel by its average value as shown in the following formula: ( α R , β G , γ B ) → ( α R α n ∑ i R , β G β n ∑ i G , γ B γ n ∑ i B ) {\displaystyle \left(\alpha R,\beta G,\gamma B\right)\rightarrow \left({\frac {\alpha R}{{\frac {\alpha }{n}}\sum _{i}R}},{\frac {\beta G}{{\frac {\beta }{n}}\sum _{i}G}},{\frac {\gamma B}{{\frac {\gamma }{n}}\sum _{i}B}}\right)} As mentioned above, grey world color normalization is invariant to illuminated color variations α, β and γ, however it has one important problem: it does not account for all variations of illumination intensity and it is not dynamic; when new objects appear in the scene it fails. To solve this problem there are several variants of the grey world algorithm. Additionally there is an iterative variation of the grey world normalization, however it was not found to perform significantly better. === Histogram equalization === Histogram equalization is a non-linear transform which maintains pixel rank and is capable of normalizing for any monotonically increasing color transform function. It is considered to be a more powerful normalization transformation than the grey world method. The results of histogram equalization tend to have an exaggerated blue channel and look unnatural, due to the fact that in most images the distribution of the pixel values is usually more similar to a Gaussian distribution, rather than uniform. === Histogram specification === Histogram specification transforms the red, green and blue histograms to match the shapes of three specific histograms, rather than simply equalizing them. It refers to a class of image transforms which aims to obtain images of which the histograms have a desired shape. As specified, firstly it is necessary to convert the image so that it has a particular histogram. Assume an image x. The following formula is the equalization transform of this image: y = f ( x ) = ∫ 0 x p x ( u ) d u {\displaystyle y=f(x)=\int \limits _{0}^{x}p_{x}(u)du} Then assume wanted image z. The equalization transform of this image is: y ′ = g ( z ) = ∫ 0 z p z ( u ) d u {\displaystyle y'=g(z)=\int \limits _{0}^{z}p_{z}(u)du} Of course p z ( u ) {\displaystyle p_{z}(u)} is the histogram of the output image. The formula to find the inverse of the above transform is: z = g − 1 ( y ′ ) {\displaystyle z=g^{-1}(y')} Therefore, since images y and y' have the same equalized histogram they are actually the same image, meaning y = y' and the transform from the given image x to the wanted image z is: z = g − 1 ( y ′ ) = g − 1 ( y ) = g − 1 ( f ( x ) ) {\displaystyle z=g^{-1}(y')=g^{-1}(y)=g^{-1}(f(x))} Histogram specification has the advantage of producing more realistic looking images, as it does not exaggerate the blue channel like histogram equalization. === Comprehensive Color Normalization === The comprehensive color normalization is shown to increase localization and object classification results in combination with color indexing. It is an iterative algorithm which works in two stages. The first stage is to use the red, green and blue color space with the intensity normalized, to normalize each pixel. The second stage is to normalize each color channel separately, so that the sum of the color components is equal to one third of the number of pixels. The iterations continue until convergence, meaning no additional changes. Formally: Normalize the color image f ( t ) = [ f i j ( t ) ] i = 1... N , j = 1... M {\displaystyle f^{(t)}=[f_{ij}^{(t)}]_{i=1...N,j=1...M}} which consists of color vectors f i j ( t ) = ( r i j ( t ) , g i j ( t ) , b i j ( t ) ) T . {\displaystyle f_{ij}^{(t)}=(r_{ij}^{(t)},g_{ij}^{(t)},b_{ij}^{(t)})^{T}.} For the first step explained above, compute: S i j := r i j ( t ) + g i j ( t ) + b i j ( t ) {\displaystyle S_{ij}:=r_{ij}^{(t)}+g_{ij}^{(t)}+b_{ij}^{(t)}} which leads to r i j ( t + 1 ) = r i j ( t ) S i j , g i j ( t + 1 ) = g i j ( t ) S i j {\displaystyle r_{ij}^{(t+1)}={\frac {r_{ij}^{(t)}}{S_{ij}}},g_{ij}^{(t+1)}={\frac {g_{ij}^{(t)}}{S_{ij}}}} and b i j ( t + 1 ) = b i j ( t ) S i j . {\displaystyle b_{ij}^{(t+1)}={\frac {b_{ij}^{(t)}}{S_{ij}}}.} For the second step explained above, compute: r ′ = 3 N M ∑ i = 1 N ∑ j = 1 M r i j ( t + 1 ) {\displaystyle r'={\frac {3}{NM}}\sum _{i=1}^{N}\sum _{j=1}^{M}r_{ij}^{(t+1)}} and normalize r i j ( t + 2 ) = r i j ( t + 1 ) r ′ . {\displaystyle r_{ij}^{(t+2)}={\frac {r_{ij}^{(t+1)}}{r'}}.} Of course the same process is done for b' and g'. Then these two steps are repeated until the changes between iteration t and t+2 are less than some set threshold. Comprehensive color normalization, just like the histogram equalization method previously mentioned, produces results that may look less natural due to the reduction in the number of color values.

    Read more →
  • StepFun

    StepFun

    Shanghai Jieyue Xingchen Intelligent Technology Co., Ltd, known as StepFun, is an artificial intelligence (AI) company based in Shanghai, China. It has been dubbed one of China's "AI Tiger" companies by investors. == Background == StepFun was founded in April 2023 by former Microsoft employees. Investors include Tencent, Qiming Venture Partners and Shanghai State-owned Capital Investment. In July 2025 at the World Artificial Intelligence Conference, StepFun announced the "Model-Chip Ecosystem Innovation Alliance" which consisted of Chinese developers of large language models (LLMs) and AI chip manufacturers. This included companies such as Huawei, Biren Technology, Moore Threads and Enflame. Another second alliance named the "Shanghai General Chamber of Commerce AI Committee" was also established that included StepFun, SenseTime, MiniMax, MetaX and Iluvatar CoreX. On 25 February 2026, it was reported that StepFun was seeking an initial public offering on the Hong Kong Stock Exchange. StepFun focuses on multimodal models which are designed to understand multiple types of input data such as text, video and audio. == Products == In July 2024 at the World Artificial Intelligence Conference, StepFun officially launched Step-2, a trillion-parameter LLM, along with the Step-1.5V multimodal model and the Step-1X image generation model. In February 2025, StepFun and Geely jointly announced the open-sourcing of two multimodal large models to global developers. They were Step-Video-T2V and Step-Audio. In July 2025, StepFun released Step 3. The Model-Chip Ecosystem Innovation Alliance aimed to optimize Step 3 for domestic chips. In April 2025, Step-R1-V-Mini was released. It is a multimodal reasoning model designed for visual interpretation and image understanding. In February 2026, Step-3.5-Flash, a mixture-of-experts model with 196 billion parameters and 11 billion active parameters was released under the free and open-source Apache 2.0 license. It supports tool use and a 256k token context window. == Models ==

    Read more →
  • METR

    METR

    Model Evaluation and Threat Research (METR) (MEE-tər), is a nonprofit research institute, based in Berkeley, California, that evaluates frontier AI models' capabilities to carry out long-horizon, agentic tasks that some researchers argue could pose catastrophic risks to society. METR has worked with leading AI companies to conduct pre-deployment model evaluations and contribute to system cards, including OpenAI's o3, o4-mini, GPT-4o and GPT-4.5, and Anthropic's Claude models. METR's CEO and founder is Beth Barnes, a former alignment researcher at OpenAI who left in 2022 to form ARC Evals, the evaluation division of Paul Christiano's Alignment Research Center. In December 2023, ARC Evals was spun off into an independent 501(c)(3) nonprofit and renamed METR. == Research == A substantial amount of METR's research is focused on evaluating the capabilities of AI systems to conduct research and development of AI systems themselves, including RE-Bench, a benchmark designed to test whether AIs can "solve research engineering tasks and accelerate AI R&D". === Doubling time estimates === In March 2025, METR published a paper noting that the length of software engineering tasks that the leading AI model could complete had a doubling time of around 7 months between 2019 and 2024. In January 2026, METR released a new version of their time horizon estimates model (Time Horizon 1.1). According to the updated model, the rate of progress of AI capabilities has increased since 2023, with a post-2023 doubling time estimated at 130.8 days (4.3 months). Progress is thus estimated to be 20% more rapid. === Time horizon measurements === METR releases a "task-completion time horizon" for analysed AI models. This measures the "task duration (measured by human expert completion time) at which an AI agent is predicted to succeed with a given level of reliability." The metric is reported in two variants: the 50%-time horizon, which gives the task duration at which an AI model is estimated to succeed 50% of the time, and the 80%-time horizon, which gives the task duration at which an AI model is estimated to succeed 80% of the time. METR has published two versions of the underlying model: Time Horizon 1.0 and Time Horizon 1.1, the latter introduced in January 2026. As of 9 May 2026, the best-performing model is Claude Mythos, with a 50%-time horizon of likely at least 16 hours and an 80%-time horizon of 3 hours and 6 minutes. METR notes that "[m]easurements above 16 [hours] are unreliable with [their] current task suite". The following table provides time horizon estimates ordered by each model's release date:

    Read more →
  • AI Dungeon

    AI Dungeon

    AI Dungeon is a single-player/multiplayer text adventure game which uses artificial intelligence (AI) to generate content and allows players to create and share adventures and custom prompts. The game's first version was made available in May 2019, and its second version (initially called AI Dungeon 2) was released on Google Colaboratory in December 2019. It was later ported that same month to its current cross-platform web application. The AI model was then reformed in July 2020. == Gameplay == AI Dungeon is a text adventure game that uses artificial intelligence to generate random storylines in response to player-submitted stimuli. In the game, players are prompted to choose a setting for their adventure (e.g. fantasy, mystery, apocalyptic, cyberpunk, zombies), followed by other options relevant to the setting (such as character class for fantasy settings). After beginning an adventure, four main interaction methods can be chosen for the player's text input: Do: Must be followed by a verb, allowing the player to perform an action. Say: Must be followed by dialogue sentences, allowing players to communicate with other characters. Story: Can be followed by sentences describing something that happens to progress the story, or that players want the AI to know for future events. See: Must be followed by a description, allowing the player to perceive events, objects, or characters. Using this command creates an AI generated image, and does not affect gameplay. The game adapts and responds to most actions the player enters. Providing blank inputs can be used to prompt the AI to generate further content, and the game also provides players with options to undo or redo or modify recent events to improve the game's narrative. Players can also tell the AI what elements to "remember" for reference in future parts of their playthrough. === User-generated content === In addition to AI Dungeon's pre-configured settings, players can create custom "adventures" from scratch by describing the setting in text format, which the AI will then generate a setting from. These custom adventures can be published for others to play, with an interface for browsing published adventures and leaving comments under them. === Multiplayer === AI Dungeon includes a multiplayer mode in which different players each have their own character and take turns interacting with the AI within the same game session. Multiplayer supports both online play across multiple devices or local play using a shared device. The game's hosts are able to supervise the AI and modify its output. Unlike the single-player game, in which actions and stories use second person narration, multiplayer game stories are presented using third-person narration. === Worlds === AI Dungeon allows players to set their adventures within specific "Worlds" that give context to the broader environment where the adventure takes place. This feature was first released with two different worlds available for selection: Xaxas, a "world of peace and prosperity"; and Kedar, a "world of dragons, demons, and monsters". == Development == === AI Dungeon Classic (Early GPT-2) === The first version of AI Dungeon (sometimes referred to as AI Dungeon Classic) was designed and created by Nick Walton of Brigham Young University's "Perception, Control, and Cognition" deep learning laboratory in March 2019 during a hackathon. Before this, Walton had been working as an intern for several companies in the field of autonomous vehicles. This creation used an early version of the GPT-2 natural-language-generating neural network, created by OpenAI, allowing it to generate its original adventure narratives. During his first interactions with GPT-2, Walton was partly inspired by the tabletop game Dungeons & Dragons (D&D), which he had played for the first time with his family a few months earlier: I realized that there were no games available that gave you the same freedom to do anything that I found in [Dungeons & Dragons] ... You can be so creative compared to other games. This led him to wonder if an AI could function as a dungeon master. Unlike later versions of AI Dungeon, the original did not allow players to specify any action they wanted. Instead, it generated a finite list of possible actions to choose from. This first version of the game was released to the public in May 2019. It is not to be confused with another GPT-2-based adventure game, GPT Adventure, created by Northwestern University neuroscience postgraduate student Nathan Whitmore, also released on Google Colab several months after the public release of AI Dungeon. === AI Dungeon 2 (Full GPT-2) === In November 2019, a new, "full" version of GPT-2 was released by OpenAI. This new model included support for 1.5 billion parameters (which determine the accuracy with which a machine learning model can perform a task), compared with the 126 million parameter version used in the earliest stages of AI Dungeon's development. The game was recreated by Walton, leveraging this new version of the model, and temporarily rebranded as AI Dungeon 2. AI Dungeon 2's AI was given more focused training compared to its predecessor, using genre-specific text. This training material included approximately 30 megabytes of content web-scraped from chooseyourstory.com (an online community website of content inspired by interactive gamebooks, written by contributors of multiple skill levels, using logic of differing complexity) and multiple D&D rulebooks and adventures. The new version was released in December 2019 as open-source software available on GitHub. It was accessible via Google Colab, an online tool for data scientists and AI researchers that allows for free execution of code on Google-hosted machines. It could also be run locally on a PC, but in both cases, it required players to download the full model, around 5 gigabytes of data. Within days of the initial release, this mandatory download resulted in bandwidth charges of over $20,000, forcing the temporary shut-down of the game until a peer-to-peer alternative solution was established. Due to the game's sudden and explosive growth that same month, however, it became closed-source, proprietary software and was relaunched by Walton's start-up development team, Latitude (with Walton taking on the role of CTO). This relaunch constituted mobile apps for iOS and Android (built by app developer Braydon Batungbacal) on December 17. Other members of this team included Thorsten Kreutz for the game's long-term strategy and the creator's brother, Alan Walton, for hosting infrastructure. At this time, Nick Walton also established a Patreon campaign to support the game's further growth (such as the addition of multiplayer and voice support, along with longer-term plans to include music and image content) and turn the game into a commercial endeavor, which Walton felt was necessary to cover the costs of delivering a higher-quality version of the game. AI Dungeon was one of the only known commercial applications to be based upon GPT-2. Following its first announcement in December 2019, a multiplayer mode was added to the game in April 2020. Hosting a game in this mode was originally restricted to premium subscribers, although any players could join a hosted game. === Dragon model release (GPT-3) === In July 2020, the developers introduced a premium-exclusive version of the AI model, named Dragon, which uses OpenAI's API for leveraging the GPT-3 model without maintaining a local copy (released on June 11, 2020). GPT-3 was trained with 570 gigabytes of text content (approximately one trillion words, with a $12 million development cost) and can support 175 billion parameters, compared to the 40 gigabytes of training content and 1.5 billion parameters of GPT-2. The free model was also upgraded to a less-advanced version of GPT-3 and was named Griffin. Speaking shortly after this release, on the differences between GPT-2 and GPT-3, Walton stated: [GPT-3 is] one of the most powerful AI models in the world... It's just much more coherent in terms of understanding who the characters are, what they're saying, what's going on in the story and just being able to write an interesting and believable story. In the latter half of 2020, the "Worlds" feature was added to AI Dungeon, providing players with a selection of overarching worlds in which their adventures can take place. In February 2021, it was announced that AI Dungeon's developers, Latitude, had raised $3.3 million in seed funding (led by NFX, with participation from Album VC and Griffin Gaming Partners) to "build games with 'infinite' story possibilities." This funding intended to move AI content creation beyond the purely text-based nature of AI Dungeon as it existed at the time. After its announcement on August 20, a new "See" interaction mode was made available for all players and added to the game on August 30, 2022. AI Dungeon was retired from Steam on March 12, 2024. == Reception == Approximate

    Read more →
  • Anna Becker

    Anna Becker

    Anna Becker is an Israeli researcher known in the field of artificial intelligence and computer science within the financial field. == Early life and education == Becker was born in Russia and immigrated to Israel at 16 after graduating from a school in Moscow. At 17, she began her studies at Technion – Israel Institute of Technology. During her master's degree in computer science, she taught first-year students of the same course, and at 27, Becker completed her PhD in Computer Science and Artificial Intelligence. == Career == While pursuing her PhD, Becker resolved an NP-complete approximation algorithm that had been unresolved for over twenty years. This made her a recognized scholar in the field. After completing her PhD, she developed an approximation technique by a factor of two. This technique is widely used today in operating systems, database systems, and VLSI chip designs. She then founded and sold Strategy Runner, a fintech software. After this, she founded EndoTech, an algorithmic trading platform based on artificial intelligence and machine learning. EndoTech's trading strategies have been operating in live cryptocurrency markets since 2017. The platform's BTC Alpha strategy has reported an average annual return of 163% on fixed capital over eight years of live operation, with a maximum drawdown of 14% and a trade accuracy rate of approximately 83%. In 2026, EndoTech entered a partnership with Bit1 Exchange to make its BTC Alpha and ETH Alpha copy trading strategies accessible to retail investors with no minimum deposit requirement, through a full-custody model in which user funds remain in their own exchange wallets at all times.As of 2023, Becker is working on Fianchetto Fund, an AI-based investing analysis platform. Becker has also co-authored a book on Bayesian networks, which has been published widely in the field of computer science and artificial intelligence.

    Read more →
  • Personal knowledge base

    Personal knowledge base

    A personal knowledge base (PKB) is an electronic tool used by an individual to express, capture, and later retrieve personal knowledge. It differs from a traditional database in that it contains subjective material particular to the owner, that others may not agree with nor care about. Importantly, a PKB consists primarily of knowledge, rather than information; in other words, it is not a collection of documents or other sources an individual has encountered, but rather an expression of the distilled knowledge the owner has extracted from those sources or from elsewhere. The term personal knowledge base was mentioned as early as the 1980s, but the term came to prominence in the 2000s when it was described at length in publications by computer scientist Stephen Davies and colleagues, who compared PKBs on a number of different dimensions, the most important of which is the data model that each PKB uses to organize knowledge. == Data models == Davies and colleagues examined three aspects of the data models of PKBs: their structural framework, which prescribes rules about how knowledge elements can be structured and interrelated (as a tree, graph, tree plus graph, spatially, categorically, as n-ary links, chronologically, or ZigZag); their knowledge elements, or basic building blocks of information that a user creates and works with, and the level of granularity of those knowledge elements (such as word/concept, phrase/proposition, free text notes, links to information sources, or composite); and their schema, which involves the level of formal semantics introduced into the data model (such as a type system and related schemas, keywords, attribute–value pairs, etc.). Davies and colleagues also emphasized the principle of transclusion, "the ability to view the same knowledge element (not a copy) in multiple contexts", which they considered to be "pivotal" to an ideal PKB. They concluded, after reviewing many design goals, that the ideal PKB was still to come in the future. === Personal knowledge graph === In their publications on PKBs, Davies and colleagues discussed knowledge graphs as they were implemented in some software of the time. Later, other writers used the term personal knowledge graph (PKG) to refer to a PKB featuring a graph structure and graph visualization. However, the term personal knowledge graph is also used by software engineers to refer to the different subject of a knowledge graph about a person, in contrast to a knowledge graph created by a person in a PKB. == Software architecture == Davies and colleagues also differentiated PKBs according to their software architecture: file-based, database-based, or client–server systems (including Internet-based systems accessed through desktop computers and/or handheld mobile devices). == History == Non-electronic personal knowledge bases have probably existed in some form for centuries: Leonardo da Vinci's journals and notes are a famous example of the use of notebooks. Commonplace books, florilegia, annotated private libraries, and card files (in German, Zettelkästen) of index cards and edge-notched cards are examples of formats that have served this function in the pre-electronic age. Undoubtedly the most famous early formulation of an electronic PKB was Vannevar Bush's description of the "memex" in 1945. In a 1962 technical report, human–computer interaction pioneer Douglas Engelbart (who would later become famous for his 1968 "Mother of All Demos" that demonstrated almost all the fundamental elements of modern personal computing) described his use of edge-notched cards to partially model Bush's memex. == Examples == The following software applications have been used to build PKBs using various data models and architectures. The list includes software mentioned by Davies and colleagues in their 2005 paper, and additional software. Open source Compendium Haystack (MIT project) Joplin Logseq NoteCards Org-mode QOwnNotes TiddlyWiki Closed source Evernote Microsoft OneNote MindManager MyLifeBits Notion Obsidian Personal Knowbase PersonalBrain Roam Tinderbox

    Read more →
  • Mike Vernal

    Mike Vernal

    Mike Vernal (born September 7, 1980) is an American business executive who is a venture capitalist at Conviction. He was previously an investor at Sequoia Capital in Silicon Valley and was one of the top executives at Facebook between 2008 and 2016. Prior to joining Sequoia Capital, he was Vice President of Search, Local, and Developer products at Facebook. == Career == Vernal joined Facebook in 2008. From 2009 to 2013, Vernal managed the Facebook Platform team and is credited with managing the Facebook Platform transition from desktop to mobile. During his time at Facebook, he served as vice president and was considered among the “top executives” who ran the company. In 2016, after eight years at Facebook, Vernal announced his plans to leave the company. In May 2016, he joined Sequoia Capital, a venture-capital firm specializing in technology startups. He is an early investor in Rippling, Clay, Notion and Statsig. In July 2023, The Information reported that Vernal was departing Sequoia. At Conviction, he has led investments in Listen Labs, OpenEvidence and Thinking Machines Lab.

    Read more →
  • REEM

    REEM

    REEM is a prototype humanoid robot built by PAL Robotics in Spain. It is a 1.70 m high humanoid robot with 22 degrees of freedom, with a mobile base with wheels, allowing it to move at 4 km/hour. The upper part of the robot consists of a torso with a touch screen, two motorized arms, which give it a high degree of expression, and a head, which is also motorized. REEM-A and REEM-B are the first and second prototypes of humanoid robots created by PAL Robotics. REEM-B can recognize, grasp and lift objects and walk by itself, avoiding obstacles through simultaneous localization and mapping. The robot accepts voice commands and can recognize faces. == Specifications ==

    Read more →
  • BERT (language model)

    BERT (language model)

    Bidirectional encoder representations from transformers (BERT) is a language model introduced in October 2018 by researchers at Google. It learns to represent text as a sequence of vectors using self-supervised learning. It uses the encoder-only transformer architecture. BERT dramatically improved the state of the art for large language models. As of 2020, BERT is a ubiquitous baseline in natural language processing (NLP) experiments. BERT is trained by masked token prediction and next sentence prediction. With this training, BERT learns contextual, latent representations of tokens in their context, similar to ELMo and GPT-2. It found applications for many natural language processing tasks, such as coreference resolution and polysemy resolution. It improved on ELMo and spawned the study of "BERTology", which attempts to interpret what is learned by BERT. BERT was originally implemented in the English language at two model sizes, BERTBASE (110 million parameters) and BERTLARGE (340 million parameters). Both were trained on the Toronto BookCorpus (800M words) and English Wikipedia (2,500M words). The weights were released on GitHub. On March 11, 2020, 24 smaller models were released, the smallest being BERTTINY with just 4 million parameters. == Architecture == BERT is an "encoder-only" transformer architecture. At a high level, BERT consists of 4 modules: Tokenizer: This module converts a piece of English text into a sequence of integers ("tokens"). Embedding: This module converts the sequence of tokens into an array of real-valued vectors representing the tokens. It represents the conversion of discrete token types into a lower-dimensional Euclidean space. Encoder: a stack of Transformer blocks with self-attention, but without causal masking. Task head: This module converts the final representation vectors into one-shot encoded tokens again by producing a predicted probability distribution over the token types. It can be viewed as a simple decoder, decoding the latent representation into token types, or as an "un-embedding layer". The task head is necessary for pre-training, but it is often unnecessary for so-called "downstream tasks," such as question answering or sentiment classification. Instead, one removes the task head and replaces it with a newly initialized module suited for the task, and finetune the new module. The latent vector representation of the model is directly fed into this new module, allowing for sample-efficient transfer learning. === Embedding === This section describes the embedding used by BERTBASE. The other one, BERTLARGE, is similar, just larger. The tokenizer of BERT is WordPiece, which is a sub-word strategy like byte-pair encoding. Its vocabulary size is 30,000, and any token not appearing in its vocabulary is replaced by [UNK] ("unknown"). The first layer is the embedding layer, which contains three components: token type embeddings, position embeddings, and segment type embeddings. Token type: The token type is a standard embedding layer, translating a one-hot vector into a dense vector based on its token type. Position: The position embeddings are based on a token's position in the sequence. BERT uses absolute position embeddings, where each position in a sequence is mapped to a real-valued vector. Each dimension of the vector consists of a sinusoidal function that takes the position in the sequence as input. Segment type: Using a vocabulary of just 0 or 1, this embedding layer produces a dense vector based on whether the token belongs to the first or second text segment in that input. In other words, type-1 tokens are all tokens that appear after the [SEP] special token. All prior tokens are type-0. The three embedding vectors are added together representing the initial token representation as a function of these three pieces of information. After embedding, the vector representation is normalized using a LayerNorm operation, outputting a 768-dimensional vector for each input token. After this, the representation vectors are passed forward through 12 Transformer encoder blocks, and are decoded back to 30,000-dimensional vocabulary space using a basic affine transformation layer. === Architectural family === The encoder stack of BERT has 2 free parameters: L {\displaystyle L} , the number of layers, and H {\displaystyle H} , the hidden size. There are always H / 64 {\displaystyle H/64} self-attention heads, and the feed-forward/filter size is always 4 H {\displaystyle 4H} . By varying these two numbers, one obtains an entire family of BERT models. For BERT: the feed-forward size and filter size are synonymous. Both of them denote the number of dimensions in the middle layer of the feed-forward network. the hidden size and embedding size are synonymous. Both of them denote the number of real numbers used to represent a token. The notation for encoder stack is written as L/H. For example, BERTBASE is written as 12L/768H, BERTLARGE as 24L/1024H, and BERTTINY as 2L/128H. == Training == === Pre-training === BERT was pre-trained simultaneously on two tasks: Masked language modeling (MLM): In this task, BERT ingests a sequence of words, where one word may be randomly changed ("masked"), and BERT tries to predict the original words that had been changed. For example, in the sentence "The cat sat on the [MASK]," BERT would need to predict "mat." This helps BERT learn bidirectional context, meaning it understands the relationships between words not just from left to right or right to left but from both directions at the same time. Next sentence prediction (NSP): In this task, BERT is trained to predict whether one sentence logically follows another. For example, given two sentences, "The cat sat on the mat" and "It was a sunny day", BERT has to decide if the second sentence is a valid continuation of the first one. This helps BERT understand relationships between sentences, which is important for tasks like question answering or document classification. ==== Masked language modeling ==== In masked language modeling, 15% of tokens would be randomly selected for masked-prediction task, and the training objective was to predict the masked token given its context. In more detail, the selected token is: replaced with a [MASK] token with probability 80%, replaced with a random word token with probability 10%, not replaced with probability 10%. The reason not all selected tokens are masked is to avoid the dataset shift problem. The dataset shift problem arises when the distribution of inputs seen during training differs significantly from the distribution encountered during inference. A trained BERT model might be applied to word representation (like Word2Vec), where it would be run over sentences not containing any [MASK] tokens. It is later found that more diverse training objectives are generally better. As an illustrative example, consider the sentence "my dog is cute". It would first be divided into tokens like "my1 dog2 is3 cute4". Then a random token in the sentence would be picked. Let it be the 4th one "cute4". Next, there would be three possibilities: with probability 80%, the chosen token is masked, resulting in "my1 dog2 is3 [MASK]4"; with probability 10%, the chosen token is replaced by a uniformly sampled random token, such as "happy", resulting in "my1 dog2 is3 happy4"; with probability 10%, nothing is done, resulting in "my1 dog2 is3 cute4". After processing the input text, the model's 4th output vector is passed to its decoder layer, which outputs a probability distribution over its 30,000-dimensional vocabulary space. ==== Next sentence prediction ==== Given two sentences, the model predicts if they appear sequentially in the training corpus, outputting either [IsNext] or [NotNext]. During training, the algorithm sometimes samples two sentences from a single continuous span in the training corpus, while at other times, it samples two sentences from two discontinuous spans. The first sentence starts with a special token, [CLS] (for "classify"). The two sentences are separated by another special token, [SEP] (for "separate"). After processing the two sentences, the final vector for the [CLS] token is passed to a linear layer for binary classification into [IsNext] and [NotNext]. For example: Given "[CLS] my dog is cute [SEP] he likes playing [SEP]", the model should predict [IsNext]. Given "[CLS] my dog is cute [SEP] how do magnets work [SEP]", the model should predict [NotNext]. === Fine-tuning === BERT is meant as a general pretrained model for various applications in natural language processing. That is, after pre-training, BERT can be fine-tuned with fewer resources on smaller datasets to optimize its performance on specific tasks such as natural language inference and text classification, and sequence-to-sequence-based language generation tasks such as question answering and conversational response generation. The original BERT paper published results demonstrating that a small amount of fine

    Read more →
  • Six Little Dragons

    Six Little Dragons

    Six Little Dragons (Chinese: 杭州六小龙), or Six Little Dragons of Hangzhou, are an informal grouping of the tech startups Game Science, DeepSeek, Unitree Robotics, DEEP Robotics, BrainCo and Manycore Tech. All six were established in Hangzhou, They are active in artificial intelligence, robotics, gaming, and brain-computer interface technology. Hangzhou is referred to as the China’s “e-commerce capital” (电商之都). The nickname "Six Little Dragons" originated from the Chinese internet. == Background == === Chinese government investments (2002 — 2010s) === From 2002 to 2007, under Xi Jinping's leadership as party secretary of Zhejiang, provincial spending on technology research grew over four times to 28 billion RMB. The province launched "Digital Zhejiang" (数字浙江) to advance modernization and the "Eight Eight Strategy" (八八战略), focusing on eight advantages and actions to boost industrial development, including specialized industries. In 2010, Hangzhou's government started "Project Eagle" (雏鹰计划) to aid science and technology startups. The project works with incubators and accelerators to find promising tech companies and offers public funding and other help, especially for startups by graduates and returning students. Unitree received support in the initial phase, along with government subsidies from Binjiang District. === AI-startups and further investments (2025 — present) === In January 2025, the Chinese government created the "Hangzhou AI Industry Chain High-Quality Development Action Plan" which focuses on computing power, LLM technologies, and AI applications. The plan was made to certify over 2,000 new high-tech enterprises, initiate over 300 major tech projects, and invest more than 300 billion RMB (US$40 billion) annually. The Chinese government also renewed "Project Eagle" and to allocate 15% of industrial policy funds for future industries. Hangzhou aimed to become a center for tech startups, highlighting the "six little dragons of Hangzhou," a nickname popularized in early 2025. This group includes DeepSeek, Game Science, Unitree Robotics, Manycore Tech, BrainCo, and DEEP Robotics, companies in gaming, robotics, and software development. Earlier in 2025, DeepSeek, one of the six dragons, launched an AI system at a much lower cost than those from Silicon Valley. Since then, DeepSeek and Alibaba have produced top-performing open source AI models. Game Science launched the successful video game Black Myth: Wukong in 2024, while Unitree gained attention for their dancing robots in the 2025 annual spring gala broadcast by Chinese state media. The group was acknowledged by Chinese authorities in Hangzhou in a New Years message for local businesses in January 2025. Hangzhou’s universities were given credit for the development of Chinese technological industry. Zhejiang University alumni founded three of the "Six Little Dragons". By September 2024, the university produced 102 executives in Chinese AI start-ups, ranking third among China's top institutions. On February 20, 2025, Alibaba's Eddie Wu stated that the company would focus on artificial generative intelligence and plans significant investment in AI. The company also sought to boost foreign investment to China's "Six Little Dragons" following Alibaba's founder Jack Ma attended General Secretary of the Chinese Communist Party Xi Jinping's business symposium with corporate leaders and entrepreneurs that same month. == Challenges == China's net foreign direct investment (FDI) fell by US$168 billion in 2024, marking the largest capital flight since 1990. Foreign investment peaked at US$344 billion in 2021 but has since declined according to the State Administration of Foreign Exchange. In 2024, foreign investors put in only US$4.5 billion while Chinese firms invested US$173 billion abroad. According to interviews conducted by The New York Times, some start-up company founders believe that Chinese government's support for Hangzhou's technological sector has deterred foreign investors. Tensions with the United States led many international companies to adopt a China Plus One strategy, while Chinese firms build factories overseas to avoid potential Trump tariffs. China also faced US restrictions on its access of advanced chips, forcing Chinese tech companies to stockpile Nvidia chips while Chinese producers like Huawei and Semiconductor Manufacturing International Corporation (SMIC) were competing to produce their own.

    Read more →
  • Journal of Experimental and Theoretical Artificial Intelligence

    Journal of Experimental and Theoretical Artificial Intelligence

    The Journal of Experimental and Theoretical Artificial Intelligence is a quarterly peer-reviewed scientific journal published by Taylor and Francis. It covers all aspects of artificial intelligence and was established in 1989. The editor-in-chief is Eric Dietrich (Binghamton University), the deputy editors-in-chief are Li Pheng Khoo (School of Mechanical & Aerospace Engineering, Nanyang Technological University) and Antonio Lieto (Department of Computer Science, University of Turin). == Abstracting and indexing == The journal is abstracted and indexed in: According to the Journal Citation Reports, the journal has a 2020/2021 impact factor of 2.340 .

    Read more →
  • Production Rule Representation

    Production Rule Representation

    The Production Rule Representation (PRR) is a proposed standard of the Object Management Group (OMG) that aims to define a vendor-neutral model for representing production rules within the Unified Modeling Language (UML), specifically for use in forward-chaining rule engines. == History == The OMG set up a Business Rules Working Group in 2002 as the first standards body to recognize the importance of the "Business Rules Approach". It issued 2 main RFPs in 2003 – a standard for modeling production rules (PRR), and a standard for modeling business rules as business documentation (BSBR, now SBVR). PRR was mostly defined by and for vendors of Business Rule Engines (BREs) (sometimes termed Business Rules Engine(s), like in Wikipedia). Contributors have included all the major BRE vendors, members of RuleML, and leading UML vendors. == Evolution == The PRR RFP originally suggested that PRR use a combination of UML OCL and Action Semantics for rule conditions and actions. However, expecting modellers to learn 2 relatively obscure UML languages in order to define a production rule proved unpalatable. Therefore, PRR OCL was defined that included OCL extensions for simple rule actions (as well as external functions). PRR OCL is currently considered "non-normative" i.e. is not part of the PRR standard per se. PRR beta applies just to a PRR Core that excludes an explicit expression language. The PRR RFP envisaged covering both forward and backward chaining rule engines. However, the lack of vendor support for / interest in backward chaining caused this to be revise to forward chaining and "sequential" semantics. The latter is simply the scripting mode provided by many BPM tools, where rules are listed and executed sequentially as if programmed. This provides PRR with better compatibility with typical BPM scripting engines (and acknowledges the fact that most BREs today support a "sequential" mode of operation, improving performance in some circumstances). == Status == PRR is currently at version 1.0.

    Read more →
  • Curve (tonality)

    Curve (tonality)

    In image editing, a curve is a remapping of image tonality, specified as a function from input level to output level, used as a way to emphasize colours or other elements in a picture. Curves can usually be applied to all channels together in an image, or to each channel individually. Applying a curve to all channels typically changes the brightness in part of the spectrum. Light parts of a picture can be easily made lighter and dark parts darker to increase contrast. Applying a curve to individual channels can be used to stress a colour. This is particularly efficient in the Lab colour space due to the separation of luminance and chromaticity, but it can also be used in RGB, CMYK or whatever other colour models the software supports.

    Read more →
  • Ishikawa diagram

    Ishikawa diagram

    Ishikawa diagrams (also called fishbone diagrams, herringbone diagrams, cause-and-effect diagrams) are causal diagrams created by Kaoru Ishikawa that show the potential causes of a specific event. Common uses of the Ishikawa diagram are product design and quality defect prevention to identify potential factors causing an overall effect. Each cause or reason for imperfection is a source of variation. Causes are usually grouped into major categories to identify and classify these sources of variation. == Overview == The defect, or the problem to be solved, is shown as the fish's head, facing to the right, with the causes extending to the left as fishbones; the ribs branch off the backbone for major causes, with sub-branches for root-causes, to as many levels as required. Ishikawa diagrams were popularized in the 1960s by Kaoru Ishikawa, who pioneered quality management processes in the Kawasaki shipyards, and in the process became one of the founding fathers of modern management. The basic concept was first used in the 1920s, and is considered one of the seven basic tools of quality control. It is known as a fishbone diagram because of its shape, similar to the side view of a fish skeleton. Mazda Motors famously used an Ishikawa diagram in the development of the Miata (MX5) sports car. == Root causes == Root-cause analysis is intended to reveal key relationships among various variables, and the possible causes provide additional insight into process behavior. It shows high-level causes that lead to the problem encountered by providing a snapshot of the current situation. There can be confusion about the relationships between problems, causes, symptoms and effects. Smith highlights this and the common question “Is that a problem or a symptom?” which mistakenly presumes that problems and symptoms are mutually exclusive categories. A problem is a situation that bears improvement; a symptom is the effect of a cause: a situation can be both a problem and a symptom. At a practical level, a cause is whatever is responsible for, or explains, an effect - a factor "whose presence makes a critical difference to the occurrence of an outcome". The causes emerge by analysis, often through brainstorming sessions, and are grouped into categories on the main branches off the fishbone. To help structure the approach, the categories are often selected from one of the common models shown below, but may emerge as something unique to the application in a specific case. Each potential cause is traced back to find the root cause, often using the 5 Whys technique. Typical categories include: === The 5 Ms (used in manufacturing) === Originating with lean manufacturing and the Toyota Production System, the 5 Ms is one of the most common frameworks for root-cause analysis: Manpower / Mindpower (physical or knowledge work, includes: kaizens, suggestions) Machine (equipment, technology) Material (includes raw material, consumables, and information) Method (process) Measurement / medium (inspection, environment) These have been expanded by some to include an additional three, and are referred to as the 8 Ms: Mission / mother nature (purpose, environment) Management / money power (leadership) Maintenance === The 8 Ps (used in product marketing) === This common model for identifying crucial attributes for planning in product marketing is often also used in root-cause analysis as categories for the Ishikawa diagram: Product (or service) Price Place Promotion People (personnel) Process Physical evidence (proof) Performance === The 4 or 5 Ss (used in service industries) === An alternative used for service industries, uses four categories of possible cause: Surroundings: Refers to the environment in which the process occurs. Suppliers: Refers to external parties that provide inputs—raw materials, components, or services. Systems: Refers to the procedures, processes, and technologies used to perform the work. Skill: Refers to the human factor, particularly the knowledge and abilities of employees. Safety: Refers to physical and psychological well-being in the workplace. == Use in specific industries == The Ishikawa diagram has been widely adopted across various industries as an effective tool for root cause analysis in quality, efficiency, and safety-related issues. Its versatility allows it to be applied in both manufacturing and service contexts. In the manufacturing industry, particularly in the automotive and electronics sectors, the diagram is frequently used in continuous improvement initiatives such as Six Sigma and Lean Manufacturing. Quality teams use it to identify causes related to materials, methods, machinery, manpower, environment, and measurement, facilitating informed decision-making to reduce defects and optimize processes. In the food industry, the Ishikawa diagram is applied to analyze issues related to food safety, temperature control, cross-contamination, and regulatory compliance. Its use enables companies to identify improvement opportunities in production, packaging, and distribution stages. In the pharmaceutical sector, it is a key tool in process validation, quality control, and compliance with Good Manufacturing Practices (GMP). It helps visualize factors affecting product quality from formulation to storage. It has also been successfully implemented in sectors such as aerospace, pulp and paper, construction, education, and healthcare, where it supports structured problem-solving and promotes continuous improvement and a culture of quality.

    Read more →
  • GPT-5.3-Codex

    GPT-5.3-Codex

    GPT-5.3-Codex (Generative Pre-trained Transformer 5.3 Codex) is a large language model (LLM) announced and released by OpenAI on February 5, 2026. It is made as a competitor to Claude's Opus 4.6, focusing on code generation, speed and the ability to search repositories, run terminal commands and at the same time, debug code. In technical benchmarks, it is reported that GPT-5.3 Codex is 25% faster than Opus 4.6. GPT-5.3 Codex is available in the Codex app and on the web; access via API is also planned. According to OpenAI, GPT-5.3-Codex is the company's "first model that was instrumental in creating itself." On February 12, 2026, GPT-5.3-Codex-Spark was released in a research preview, which is a smaller version of GPT-5.3-Codex which supports text-only input. As of February 2026, GPT-5.3-Codex is only available for ChatGPT Pro ($200/month) subscribers.

    Read more →