Tokenization - freeCodeCamp.org

How to Implement Tokenization using JWT and Django Rest Framework

Velda Kiara — Thu, 23 Feb 2023 15:53:00 +0000

When I was a young girl, we used to have sports competitions like running a hundred meters, relays, swimming, and basketball games.

My strengths were swimming and basketball. I went home with many gifts or, as my school's game master said, a token of appreciation.

A token is a secure form used to transmit data between two parties. The medals I got from the competitions hold a lot of value for me, and if I gave them to someone else, they would merely be trinkets.

JavaScript Object Notation (JSON) is a format used to present structured data based on JavaScript syntax. We use it to transmit data in web applications by sending the data from the server to the client's display.

JWT (JSON Web Token) is a form of transmitting a JSON object as information between parties. Let's learn more about what JWTs are and how they work.

The Importance of JWTs

JWTs are important for two main reasons:

Authorization: in our competitions, we had to present our school identification cards for verification before getting our medals.

Our school IDs acted like log in requests in applications. These requests, in apps, contain a JWT that allows users to get permission to access any resources that are accessible with that token.

Information exchange: my medals were a badge of honor and a way to get a certificate signed and stamped by our games master for legitimacy.

We use JWTs to exchange information in cases where they are signed – for example using public-private key pairs to make sure that the integrity of the information is not compromised since the payload and header are used to compute signatures.

How Do JWTs Work?

A JWT is an authorization token that is included in requests. Here's an example of what one looks like:

eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNjczNTI3ODMzLCJpYXQiOjE2NzM1Mjc0MzAsImp0aSI6IjNkMGRkMGZiZjA5ZjRmZWU4MTZmMGQyOTQ5OWU3ZmFmIiwidXNlcl9pZCI6IjFmYTBiMGJkLWY4MmMtNDQzNy1iMmViLTMwOTYzMGZkNzQ2NiJ9.-swqFh4MCecycmodQfO8ZmfsDJ3DqoZBsdNzEWhfzhA

You can get a JWT through logging in with a username and password. In exchange the server returns an access and refresh token in the form of a JWT. The tokens access resources on the server.

The lifetimes of the access and refresh tokens vary since access tokens last for five minutes or less while the refresh tokens can last for 24 hours. But you can customize the timelines of both types of tokens.

If the access token expires, the client uses the refresh token to summon a new access token from the server. Once the refresh token expires, the user must log in again with their username and password to get a new pair of tokens.

It works this way to prevent damage that can occur when a token is compromised and to prevent unauthorized access.

Different parts of a JWT

JWTs hold information in three parts, as you can see in the following code blocks:

header.payload.signature

header = eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9F5
payload = eyJ0b2tlbl90eXBlIjoiYWNjZXNzIiwiZXhwIjoxNTQzODI4NDMxLCJqdGkiOiI3ZjU5OTdiNzE1MGQ0NjU3OWRjMmI0OTE2NzA5N2U3YiIsInVsztZXJfaWQiOjF9
signature = Ju70kdcaHKn1Qaz8H42zrOYk0Jx9kIciuhkTn9Xx7vhikY

The above JWT is encoded using Base64. Once decoded, the information will include something similar to the following parts:

Header

The header contains:

type: the specification that the token is a JWT
algorithm: the signing algorithm used to sign said token

Algorithms that are used to sign include RSA, HMAC, or SHA256. The signatures for the tokens serve two purposes – integrity and authenticity.

An example of a header with the algorithm and type is as shown below:

{
  "alg": "HS256",
  "typ": "JWT"
}

Payload

The payload contains the intended messages which are commonly known as claims and metadata, as well as any other information.

There are three types of claims:

Registered claims: they include exp (expiration time), iss (issuer), sub (subject) and aud (audience). They are highly recommended since they provide information on the use and condition of use of the token.
Public claims: these are claims that are unique to avoid collisions with other services that use JWT.
Private claims: these are claims that are used specifically between two parties that understand the meaning and use. Like the example of my medals, my games master and I understood the value.

Below is an example of what a payload looks like.

{
  "token_type": "access",
  "exp": 1543828431,
  "jti": "7f5997b7150d46579dc2b49167097e7b",
  "user_id": 4
}

token_type is a label that shows what kind of token this is. Case in point, it's an access token.
exp stands for expiration. It's the time the token will stop working – in this case the number represents date and time in Unix time.
jti stands for JWT ID. It's a unique identifier for this specific token. The ID is used to keep track of which tokens have been used, to prevent use of the same token more than once.
user_id: is an identifier of the user this token belongs to. In this case, the number 4 is the user identification.

Signature

The signature verifies that information is only accessed by authorized people. It is issued by the JWT backend using base64 + payload +SECRET_KEY.

The signature is verified for each request. To validate the signature, you use the SECRET_KEY. Remember the purpose of having it called SECRET_KEY is so that it is secret.

Now let's see how JWTs work in practice with an example.

Project Setup

To illustrate how JWTs work, I will use the simple JWT, which is a JSON Web Token authentication plugin for Django Rest Framework (DRF).

Prerequisites

To follow along, you should have some prior knowledge of HTML files and how to set up a Django project. You should also be familiar with what an API (Application Programming Interface) is.

You will also need to have a Django project already setup. You could name it tokenization also add an app called access.

Once your app and project are set up, you are good to go.

Simple JWT Installation

As mentioned, I will be using the simple JWT which provides JWT authentication for the Django Rest Framework (DRF).

DRF is a third-party package for Django used as a toolkit for building Web API's. It provides a seamless experience while you build, test, debug and maintain RESTful APIs using in Django.

RESTful APIs (Representational State Transfer APIs) are a type of web API that allow communication between different applications over the internet in a fast, reliable, and scalable way.

RESTful APIs are stateless. This means that requests contain information to finalize the request, and the server does not need to remember the history of previous requests.

To install simple JWT, use the command below in your terminal:

pip install djangorestframework_simplejwt

How to Set Authentication to Simple JWT

Go to your project (tokenization), and in the settings.py file, add the following code to configure the REST framework to use simple JWT for authentication:

REST_FRAMEWORK = {
    'DEFAULT_AUTHENTICATION_CLASSES': [
        'rest_framework_simplejwt.authentication.JWTAuthentication',
    ],
}

The code above specifies the default authentication class to be used for all API views in the application.

DEFAULT_AUTHENTICATION_CLASSES': [ 'rest_framework_simplejwt.authentication.JWTAuthentication', ] sets the default authentication class to be JWTAuthentication from the rest_framework_simplejwt package. This means that all API views in the project will use JWT authentication by default.

How to Define Uniform Resource Locator Patterns

In your project (tokenization), create a file (if you have not created one yet) named
urls.py . Then add the following code:

from django.urls import path
from rest_framework_simplejwt import views as jwt_views

urlpatterns = [
    path('api/token/', jwt_views.TokenObtainPairView.as_view(), name='token_obtain_pair'),
    path('api/token/refresh/', jwt_views.TokenRefreshView.as_view(), name='token_refresh'),
]

The path function creates a new URL pattern and maps it to the specified view.

The URL (Uniform Resource Locator) path /api/token/ is mapped to the view jwt_views.TokenObtainPairView.as_view(). The as_view()method converts the class-based view to a function-based view used in routing. The name parameter makes it easy to refer to the URL pattern in other parts of your code.

We have now created an endpoint for obtaining JWT tokens. If a request is made to the endpoint, TokenObtainPairView, view handles the request and returns a JWT token in the response for authentication.

How to Customize the Token Timelines

To customize the timeline of the tokens, first add the rest_framework_simplejwt in your installed apps section in the project (tokenization) under the settings.py file. The purpose for adding the rest_framework_simplejwt is for configuration.

To add the timeline we want, we will first create a dictionary called SIMPLE_JWT. Then we'll create variables to hold timelines for the access and refresh token.

The code snippet below shows how to set up the timelines for the tokens:

INSTALLED_APPS = [
    #other files
    'rest_framework_simplejwt',
]

SIMPLE_JWT = {
    'ACCESS_TOKEN_LIFETIME': timedelta(minutes=5),
    'REFRESH_TOKEN_LIFETIME': timedelta(days=1),
}

Before you use the timedelta you will need to import it like this: python from datetime import timedelta.

Visual Interaction of the API

In this section, we will use the Django Rest Framework web interface to access the endpoints.

Testing tokenization

The access and refresh tokens are highlighted in red, as shown above. To get the tokens for a user, you need to input the correct password and username for an existing user.

Use a refresh token through this endpoint for an access token: /api/token/refresh/

Refresh token

A refresh token gets an access token without the user using their login credentials to extend the user's session. This provides a seamless user experience and improves security by reducing the number of times a user has to key in their credentials.

How to Add a Homepage

If you want to create a visually appealing interface, you can build a custom homepage to replace the current display of the API endpoints and error messages.

To add a homepage to your Django project, follow these steps:

Create a templates folder in your application (access). Then, add a index.html file inside.
Create a static folder in your application (access) and add a img folder inside.
In this case, I wanted to display an image, so I added the me.png in the img folder.
Add the code below to your html file:

{% load static %}
html>
<html>
<head>
    <title>Hometitle>
    <style>
        body {
            background-image: url("{% static 'img/me.png' %}");
            background-size: cover;
        }
    style>
head>
html>

You will notice that we needed to add {% load static %} at the top of the file for proper styling and functioning of the site. {% load static %} serves files like stylesheets(CSS), scripts(JS), and images to your HTML templates to provide the user with a seamless experience while viewing your site.

How to Define the URL Patterns For Your Home Page

In your app's (access) urls.py file (create one if you do not have it) add this:

from django.urls import path
from django.views.generic import TemplateView

app_name='access'

urlpatterns=[
path('', TemplateView.as_view(template_name='home.html'), name='home'),

]

In your project's (tokenization) settings.py file, add the following code so that the static files are served:

# Static files settings
STATIC_ROOT = os.path.join(BASE_DIR, 'static')
STATIC_URL = '/static/'
STATICFILES_DIRS = [os.path.join(BASE_DIR, 'access/static')]

Once you run the server, your home page should be like this:

home page

Bugs and Solutions

At some point, I may have deleted the migrations file which deleted some of the data I had. This caused the following error django.db.utils.OperationalError: no such table: customUser.

I was able to solve this through running the command python manage.py migrate --run-syncdb . The command syncs tables by looking into tables not created and then creating them.

I also had an issue of using a port that was already being used by a program I had left running and forgotten about.

To close a port you should identify the process using it by running the command lsof -i :. This will show you the user using it and PID. To stop the process use kill . Additionally, you can use sudo kill -9 -u .

It's good to know that to kill the port you will need administrator permissions. Also, stopping a process can cause data loss or unusual behavior so ensure your data is backed up before doing this.

Wrapping Up

In this tutorial, you have learned how JWTs work, the structure of different tokens, how to use JWT and DRF to get tokens, how to create and serve static files in Django, and how to handle deleting migration files and kill a port.

There is so much you can learn about Django and the Django Rest Framework.

The code for this article can be found here.

May your keyboard be swift, your bugs be few, and your fun meter be off the charts as you code away!

Thanks for reading my article on how to implement tokenization using JWT and Django Rest Framework. I'm always up for a good chat about coding and tech, so give me a follow on Twitter and let's continue the conversation there.

The Evolution of Tokenization – Byte Pair Encoding in NLP

Harshit Tyagi — Tue, 05 Oct 2021 15:26:44 +0000

Natural Language Processing may have come a little late to the AI game, but companies like Google and OpenAI are working wonders with NLP techniques these days.

These companies have released state-of-the-art language models like BERT and GPT-2 and GPT-3. And GitHub Copilot and OpenAI codex are among some of the popular applications that are in the news lately.

As someone who has had very limited exposure to NLP, I decided to take it up as an area of research so I can learn more about it. My next few articles and videos will focus on sharing what I learn after dissecting some important components of NLP.

Main Components of NLP

NLP systems have three main components that help machines understand natural language:

Tokenization
Embedding
Model architectures

Top Deep Learning models like BERT, GPT-2, and GPT-3 all share the same components but with different architectures that distinguish one model from another.

In this article (and the notebook that accompanies it), we are going to focus on the basics of the first component of an NLP pipeline which is tokenization. It's an often overlooked concept, but it is a field of research in itself.

We have come so far from the traditional NLTK tokenization process. And though we have state-of-the-art algorithms for tokenization, it's always a good practice to understand its evolution and how we got to where we are now.

So, here's what we'll cover:

What is tokenization?
Why do we need a tokenizer?
Types of tokenization – Word, Character, and Subword.
Byte Pair Encoding Algorithm - a version of which is used by most NLP models these days.

The next part of this tutorial will dive into more advanced (or enhanced versions of Byte Pair Encoding) algorithms:

Unigram Algorithm
WordPiece – BERT transformer
SentencePiece – End-to-End tokenizer system

What is Tokenization?

Tokenization is the process of representing raw text in smaller units called tokens. These tokens can then be mapped with numbers to further feed to an NLP model.

Here's an overly simplified example of what a tokenizer does:

## read the text and enumerate the tokens in the text
text = open('example.txt', 'r').read(). # read a text file

words = text.split(" ") # split the text on spaces

tokens = {v: k for k, v in enumerate(words)} # generate a word to index mapping

Here, we have simply mapped every word in the text to a numerical index. This is, of course, a very simple example and we have not considered grammar, punctuation, or compound words (like test, test-ify, test-ing, and so on).

So we need a more technical and accurate definition of tokenization for our work here. To take into account all punctuation and every related word, we need to start working at the character level.

There are multiple applications of tokenization. One of the use cases comes from compiler design where you might need to parse computer programs to convert raw characters into keywords of a programming language.

In deep learning, tokenization is the process of converting a sequence of characters into a sequence of tokens which further needs to be converted into a sequence of numerical vectors that can be processed by a neural network.

Why do we need a Tokenizer?

The need for a tokenizer came from the question "How can we make machines read?"

A common way of processing textual data is to define a set of rules in a dictionary and then look up that fixed dictionary of rules. But this method can only go so far, and we want machines to learn these rules from the text that it reads.

Now, machines don't know any language, nor do they understand sound or phonetics. They need to be taught from scratch and in such a way that they can read any language that's out there.

Quite a task, right?

Humans learn a language by connecting sound to the meaning and then we learn to read and write in that language. Machines can't do that, so they need to be given the most basic units of text to start processing the text.

That's where tokenization comes into play. It breaks down the text into smaller units called "tokens".

And there are different ways of tokenizing text which is what we'll learn now.

Different ways to tokenize text

To make the deep learning model learn from the text, we need a two-step process:

Tokenize – decide the algorithm we'll use to generate the tokens.
Encode the tokens to vectors

Word-based tokenization

As the first step suggests, we need to decide how to convert text into small tokens. A simple and straightforward method that most of us would propose is to use word-based tokens, splitting the text by spaces.

Problems with Word tokenizer

There's a high risk of missing words in the training data. With word tokens, your model won't recognize the variants of words that were not part of the data on which the model was trained.

So, if your model has seen foot and ball in the training data but the final text has football, the model won't be able to recognize the word and it will be treated with an token.

Similarly, punctuation poses another problem. For example, let or let's will need individual tokens which is an inefficient solution. This will require a huge vocabulary to make sure you've thought of every variant of the word.

Even if you add a lemmatizer to solve this problem, you're adding an extra step in your processing pipeline.

It's also tough to handle slang and abbreviations. We use lots of slang and abbreviations in text these days, such as "FOMO", "LOL", "tl;dr" and so on. What do we do for these words?

Finally, what if the language doesn't use spaces for segmentation? For a language like Chinese, which doesn't use spaces for word separation, this tokenizer will fail completely.

After encountering these problems, researchers looked into another approach which involved tokenizing all the characters.

Character-based tokenization

To resolve the problems associated with word-based tokenization, data scientists tried an alternative approach of character-by-character tokenization.

This did solve the problem of missing words, as now we are dealing with characters that can be encoded using ASCII or Unicode. Now it could generate embedding for any word.

Every character, whether it was a space, apostrophe, colon, or whatever can now be assigned a symbol to generate a sequence of vectors.

But this approach had its own cons.

Problems with character-based models

First, this approach requires more computing resources. Character-based models will treat each character as a token. And more tokens means more input computations to process each token which in turn requires more compute resources.

For example, for a 5-word long sentence, you may need to process 30 tokens instead of 5 word-based tokens.

Also, it narrows down the number of NLP tasks and applications. With long sequences of characters, you can only use a certain type of neural network architecture.

This limits the type of NLP tasks we can perform. For applications like entity recognition or text classification, character-based encoding might turn out to be an inefficient approach.

Finally, there's a risk of learning incorrect semantics. Working with characters could generate incorrect spellings of words. Also, with no inherent meaning, learning with characters is like learning with no meaningful semantics.

What's fascinating is that for such a seemingly simple task, multiple algorithms have been written to find the optimal tokenization policy.

After understanding the pros and cons of these tokenization methods, it makes sense to look for an approach that offers a middle route. We'll want one that preserves the semantics with limited vocabulary that can generate all the words in the text on merging.

Subword Tokenization

With character-based models, we risk losing the semantic features of the word. And with word-based tokenization, we need a very large vocabulary to encompass all the possible variations of every word.

So, the goal was to develop an algorithm that could:

Retain the semantic features of the token, that is information per token.
Tokenize without demanding a very large vocabulary with a finite set of words.

To solve this problem, you can think of breaking down the words based on a set of prefixes and suffixes. For example, we can write a rule-based system to identify subwords like "##s", "##ing", "##ify", "un##" and so on, where the position of the double hash denotes prefix and suffixes.

So, a word like "unhappily" is tokenized using subwords like "un##", "happ", and "##ily".

The model only learns relatively few subwords and then puts them together to create other words. This solves the problems of memory requirement and effort required to create a large vocabulary.

Problems with the subword tokenization algorithm:

First of all, some of the subwords that are created as per the defined rules may never appear in your text to tokenize and may end up occupying extra memory.

Also, for every language, we'll need to define a different set of rules to create subwords.

To alleviate this problem, in practice, most modern tokenizers have a training phase that identifies the recurring text in the input corpus and creates new subword tokens. For rare patterns, we stick to word-based tokens.

Another important factor that plays a vital role in this process is the size of the vocabulary that the user sets. A large vocabulary size allows for more common words to be tokenized, whereas smaller vocabulary requires more subwords to be created to create every word in the text without using the token.

Striking the right balance for your application is key here.

Byte Pair Encoding (BPE) Algorithm

BPE was originally a data compression algorithm that you use to find the best way to represent data by identifying the common byte pairs. We now use it in NLP to find the best representation of text using the smallest number of tokens.

Here's how it works:

Add an identifier () at the end of each word to identify the end of a word and then calculate the word frequency in the text.
Split the word into characters and then calculate the character frequency.
From the character tokens, for a predefined number of iterations, count the frequency of the consecutive byte pairs and merge the most frequently occurring byte pairing.
Keep iterating until you have reached the iteration limit (set by you) or until you have reached the token limit.

Let's go through each step (in the code) for some sample text. For coding this, I have taken help from Lei Mao's very minimalistic blog on BPE. I encourage you to check it out!

Step 1: Add word identifiers and calculate word frequency

Here's our sample text:

"There is an 80% chance of rainfall today. We are pretty sure it is going to rain."

## define the text first

text = "There is an 80% chance of rainfall today. We are pretty sure it is going to rain."

## get the word frequency and add the end of word () token ## at the end of each word

words = text.strip().split(" ")

print(f"Vocabulary size: {len(words)}")

Step 2: Split the word into characters and then calculate the character frequency

char_freq_dict = collections.defaultdict(int)
for word, freq in word_freq_dict.items():
    chars = word.split()
    for char in chars:
        char_freq_dict[char] += freq

char_freq_dict

Step 3: Merge the most frequently occurring consecutive byte pairings

import re

## create all possible consecutive pairs
pairs = collections.defaultdict(int)
for word, freq in word_freq_dict.items():
    chars = word.split()
    for i in range(len(chars)-1):
        pairs[chars[i], chars[i+1]] += freq

Step 4 - Iterate n times to find the best (in terms of frequency) pairs to encode and then concatenate them to find the subwords

It is better at this point to structure our code into functions. This means that we need to perform the following steps:

Find the most frequently occurring byte pairs in each iteration.
Merge these tokens.
Recalculate the character tokens' frequency with the new pair encoding added.
Keep doing this until there are no more pairs or you reach the end of the for a loop.

For detailed code, you should check out my Colab notebook.

Here’s a trimmed output of those 4 steps:

So as we iterate with each best pair, we merge (concatenate) the pair. You can see that as we recalculate the frequency, the original character token frequency is reduced and the new paired token frequency pops up in the token dictionary.

If you look at the number of tokens created, it first increases because we create new pairings – but the number starts to decrease after a number of iterations.

Here, we started with 25 tokens, went up to 31 tokens in the 14th iteration, and then came down to 16 tokens in the 50th iteration. Interesting, right?

How to improve the BPE algorithm

BPE algorithm is a greedy algorithm, which means that it tries to find the best pair in each iteration. And there are some limitations to this greedy approach.

So of course there are pros and cons of the BPE algorithm, too.

The final tokens will vary depending upon the number of iterations you have run. This also causes another problem: we now can have different tokens for a single text, and thus different embeddings.

To address this issue, multiple solutions have been proposed. But the one that stood out was a unigram language model that added subword regularization (a new method of subword segmentation) training that calculates the probability for each subword token to choose the best option using a loss function. We'll talk more about this in upcoming articles.

Do we use BPE in BERTs or GPTs?

Models like BERT or GPT-2 use some version of the BPE or the unigram model to tokenize the input text.

BERT included a new algorithm called WordPiece. It is similar to BPE, but has an added layer of likelihood calculation to decide whether the merged token will make the final cut.

Summary

In this blog, you've learned how a machine starts to make sense of language by breaking down the text into very small units.

Now, there are many ways to break text down and so it becomes important to compare one approach with another.

We started off by understanding tokenization by splitting the English text by spaces – but not every language is written the same way (that is using spaces to denote segmentation). So then we looked at splitting by character to generate character tokens.

The problem with characters was the loss of semantic features from the tokens at the risk of creating incorrect word representations or embeddings.

To get the best of both worlds, we looked at subword tokenization which was more promising. And finally we looked at the BPE algorithm to implement subword tokenization.

We'll look more into the next steps and advanced tokenizers like WordPiece, SentencePiece, and how to work with the HuggingFace tokenizer next week.

References and Notes

My post is actually an accumulation of the following papers and blogs that I encourage you to read:

Neural Machine Translation of Rare Words with Subword Units - Research paper that discusses different segmentation techniques based BPE compression algorithm.
GitHub repo on Subword NMT(Neural Machine Translation) - supporting code for the above paper.
Lei Mao’s blog on Byte Pair Encoding - I used the code in his blog to implement and understand BPE myself.
How Machines read - a blog by Cathal Horan.

If you’re looking to start in the field of data science or ML, check out my course on Foundations of Data Science & ML.

If you would like to get all my tutorials/blogs delivered directly to your inbox, consider subscribing to my newsletter here.

Have something to add or suggest, you can reach out to me via:

Universal Ethereum Delegated Transactions: No More Transaction Fees

Nikita Savchenko — Mon, 30 Sep 2019 15:53:22 +0000

TL;DR Check this back end and front end solutions for delegated transactions. It is universal for any token which supports the delegation of its functions. Read more below.

This mostly technical article provides a universal framework and a working solution for Ethereum tokens and applications that eliminates the need to pay fees in Ether, a problem that is practically killing the user experience of many blockchain applications.

Imagine spending dollars and then being asked to also hand over some Hryvnias as a transaction fee. That's how Ethereum tokens work so far.

In other words, for example, to transfer any Ethereum token (like Tether, DAI, BAT, DREAM, etc.), the user has to also spend some Ether (internal Ethereum platform currency). This introduces a big inconvenience that prevents the mass adoption of DApps: users have to purchase multiple currencies instead of just one to interact with the blockchain network.

The Problem

Tokens, as we imagine them today are just fuel for applications and services on top of blockchain networks. Organizations create their own tokens (using ICOs, IEOs, etc) and run services/applications that utilize them, introducing their own micro-economy (widely known as a token economy). But almost every token turns out to be quite a complex currency itself. By design of how blockchain networks work, in order to do something with your tokens, you also need another currency — often Ether (for Ethereum) to be able to transfer tokens.

To illustrate the problem, let's look into how users come to use different blockchain-powered services and applications like:

Trickle - where you create secure, hourly-based contracts with an untrusted party in any token
Loom - where you use Loom tokens to create sidechains in Loom Network
Cryptokitties.co - where you breed, trade and transfer kitties (ERC721 tokens)
Others (there are a lot!)

All these applications use tokens, as well as they require you to purchase Ether. The complexity of using crypto tokens as we know them today is one of the biggest reasons why 99% of crypto startups fail (or avoid adopting real crypto, for example, by replacing it with virtual coins).

As you may already know, the harder it is to use the application, the fewer users it will get right from the beginning. This is something known as The User Onboarding Funnel, which is still a big pain for blockchain-powered applications and services:

The typical user onboarding funnel of a decentralized, blockchain-based application

To understand why I put 0.001% of users prior to the service use, let's see what exactly purchasing some Ether means:

Creating a crypto wallet
Registering on Exchange (and learning all the exchange rules, including country policies!)
Passing KYC (though it's getting easier, still, many countries have limited access to exchanges)
Purchasing a minimum allowed amount of Ether (usually, it's whopping $50 while you need just nearly $0.05 to perform one or two transactions)
Withdrawing Ether to your wallet
Not to mention reading lengthy guides on how to perform all these steps properly

Instead of just:

Creating a crypto wallet

Of course, it highly depends on how the application or service is made. But, so far, there was no better simplification of the onboarding flow as just cutting crypto tokens from there, or making them fake, "virtual" currency with deposit and withdrawal function. Unfortunately, the latter approach is now the common one across all startups and companies adopting crypto, for many good reasons. Another reason could be a monetization strategy, but this is another big story worth a dedicated article (Interested? Comment out!).

Getting back to the transaction fees problem, we can state the following, which is hard to argue with.

It is natural for the user to purchase only the cryptocurrency they really need (for instance, tokens: Tether, DAI, BAT, DREAM, etc.), and they would normally expect to pay any transaction fees in this cryptocurrency.

So why not just allow them to do so? Because it's quite complex indeed. Let's see why, and how this has just got easier with our open-sourced solution (at least for Ethereum).

Existing Approaches

From the very beginning of blockchain existence, there were a couple of solutions that could simplify the user onboarding flow to the flow depicted below, avoiding the step of purchasing an intermediate currency like Ether. Still, creating a blockchain wallet is not an easy step, but some users who do understand the value of the application/service go through this step quite well.

The user onboarding funnel with delegated transactions

The solution which allows to avoid using intermediate currencies (Ether for Ethereum) is called "delegated transactions", or "meta transactions".

In short, delegated transaction, or "meta transaction" in blockchain is the type of transaction which performs an intended action on one account's behalf, while it is conducted (published) by another account (delegate), who actually pays fees for the transaction.

There are multiple approaches around the internet of the generalized concept of delegated transactions I am presenting in this article. But it seems that none of them are still widely adopted, as these approaches are quite complex by its nature, very specific as for the implementation, as well as some of them are quite complex to standardize. To be more constructive, existing approaches can be divided into 3-4 groups: those which use proxy smart contracts, those which embed delegation into a smart contract itself and, theoretically, there is an opportunity for the blockchain-native implementation (say, Ethereum 2.0).

1. Delegated transactions approaches which use proxy-contracts

Proxy contracts, or, in this context, identity contracts are tiny contracts deployed to replace the user account which wants to avoid paying fees. This smart contract is programmed to act as a wallet, as well as a "caller" (sender) of other smart contract's functions. The key is that it is a delegate account that triggers all the actions, while the true "owner" of this smart contract is another user. The user just generates correct signatures in order to control their funds stored on a smart contract address (= in their wallet).

A visualization of how identity contracts look like

Pros of this approach:

It works with any tokens and contracts which are already deployed to the network

Cons of this approach:

Users don't see tokens in their wallet, because they are physically on an identity smart contract
As a result, a need to develop custom UIs and custom tools/wallets
Identity smart contract deployment and assignment initial fees, as opposed to no fees at all
Requires a comprehensive standard to be widely adopted

2. Semi-delegated transactions via "Operator" pattern (ERC777)

There is a token standard that describes this approach — ERC777. In short, any token holder can authorize any other account to freely manage their tokens. I won't call it delegated transactions but nevertheless, I need to mention that, as here we somewhat delegate control over your tokens to other accounts.

A visualization of ERC777 standard's "operator" pattern

Pros of this approach:

Standardized

Cons of this approach:

Highly centralized around the "operator" accounts
Weak security due to operators have 100% control over your tokens
Initial fees for approval transaction
Requires additional UIs/tools development

3. Delegated transactions embedded directly into a (token) smart contract

Just the same as it is possible to implement custom fees in a proxy smart contract, paying fees in tokens can be implemented directly in a token smart contract. For example, using the approach I described in my previous article, it is possible to implement a function in a smart contract, which will transfer tokens accepting the user's signature, instead of requiring the user to call this function directly. We have implemented this approach in our DREAM Token, which is used on our dreamteam.gg platform.

A visualization of how embedding delegation into the token contract looks like

As you may notice, in contrast to the previous approach there is no identity contract anymore, and there is an optional way to call other smart contracts directly from the token contract.

Pros of this approach:

Users see their tokens as usual on their wallet's balance
No initial fees for account initialization
May not even require a standard (continue reading)

Cons of this approach:

If you have a (token) smart contract that is already deployed to the network, you cannot apply this approach to it directly. While you can always deploy a new token and, for example, a "migration" utility, which will allow other users to swap tokens (burn the old token and mint a new one)
Because a standard for this approach is yet not well-defined, implementation can drastically vary
A need to develop custom UI/tools for delegated transactions (continue reading — solved!)

4. Delegated transactions on the (blockchain) platform level

This is far the best one of all the described approaches above but also the one which is not implemented anywhere yet (by anywhere I mean the most popular blockchain platforms). There is a hope that its support comes with Ethereum 2.0 release, or at least I've heard from Vitalik that they are in progress with something cool there.

Theoretically, we can imagine this approach as being able to make an "offline" signature of two transactions at a time: one which does something useful for the signing account which wants to avoid paying fees (for example, transferring tokens) and another one which does something useful for the delegate (for example, paying fee in tokens to the account which executes these two transactions).

A visualization of how platform-native delegated transactions could look like

But the problem is, regarding Ethereum 2.0, this feature has a chance to land only in 2022 or even later. I also suppose that this feature will still require a dedicated back end (similar to the one which is introduced within this article), as it is hard to imagine how miners will accept fees in tokens. Simply put, if some of them refuse to accept fees in tokens than it makes little sense to do it on a "mining" level at all, not to mention how much would it take to track all token prices and volumes across exchanges, in a decentralized manner.

Pros of this approach:

No need to change smart contracts that were already deployed
No initial fees for account initialization
May not even require a custom UI/tools if standardized

Cons of this approach:

Most likely, will still require a centralized back end (the "delegate")
Not yet implemented on a platform level (as of 2019)

The Solution

From the four approaches above, except for the platform-level approach which is yet to be implemented and standardized in 2022+, the most appealing one is the third approach, where we embed delegated functions directly to the token smart contract. Thus, we save the standard token paradigm allowing wallets to normally work with the smart contract and have no need to wait until delegated transactions will land natively in one of the top blockchain platforms. We will stick to this approach and make it universal just below.

Delegated transactions support programmed right in the token smart contract is awesome. But how to deal with its cons? In fact, the only problem which is tough to deal with (as you cannot modify existing smart contracts), you will need to deploy a new token smart contract if you have already deployed it without delegated functions (for instance, standard ERC20 or ERC721 tokens). The next step, in this case, would be adding a way to migrate tokens from one smart contract to another. For example, you can decide to implement one more function in the new smart contract that will allow token holders to migrate their assets from the old smart contract.

Token migration function implementation can vary, starting from implementing receiveApproval in the new token, if the previous token supports approveAndCall, or ending with utilizing approve + transferFrom framework if you have just a bare minimal ERC20 (the user _approve_s tokens to the new token contract address and then calls a function in the new contract which burns old tokens and mints new ones — but this requires a standard fee for the user for the approval transaction). Actually, there is more: you can decide not to burn old tokens but to "lock" them on a new token smart contract, minting new tokens — this opens an opportunity to implement two-sided token migration, which is awesome — you won't need to list the "new" token on the exchange, while the users will still be able to send the old token to exchanges without fees in Ether! If you are interested, please fill the issue here if you want to know more details on how to do it, because this approach is worth a whole new article.

In my previous article, I provided an example of the token smart contract which supports delegation of such functions like transfer, transferFrom, approve and approveAndCall. Exactly these "delegated" functions allow users to pay fees in tokens, instead of Ether.

How delegation works in Ethereum, in short. Read more in this article.

But that wasn't enough to start the mass adoption. In this article, I am providing a complete universal back end solution (Transaction Publisher in the picture above), as well as a configurable widget (check it here), which allows you to replace Ether fees for token fees today.

Some key points before we dive in:

This delegated transactions back end is made to be universal, or standard-free, meaning that you can have any implementation of delegated functions and use any signature standard(s) in your token. From the back end standpoint, you just need to write a manifest file for your token, describing its usage.
Currently, converting collected fees in tokens back to Ether is a manual action on exchanges. But it could be a potential improvement for automation in the future (if needed).

The Concept Behind the Universal Solution

What does it mean that the token supports delegated transactions? Let's look at it using the ERC20 standard token as an example.

Smart Contract

As for the token smart contract, the approach is quite straightforward. In addition to every method like transfer(to, value) which we want to be "delegatable", we add a companion function which, instead of inspecting msg.sender, accepts the signature of a user and does the same what the original function meant to do by validating this signature inside the smart contract. Thus, for example, for transfer(to, value) function we can add transferViaSignature(to, value, ...aditionalParams) function. As you know from public-key cryptography, no one can create a valid signature except private key owner, so that's why this approach is as secure as Ethereum itself.

And the coolest part is that the delegated function implementation, as well as its signature doesn't matter much, from the delegate back end standpoint. You can even decide to implement one "call by signature" function for all other functions that the smart contract supports. Delegate back end just need to know how to call this function, which is solved by providing an off-chain contract manifest for the delegate back end. For example, the argument additionalParams in transferViaSignature can vary and can include anything from this list, if not more: fee, fee recipient address, expiration timestamp, a signature standard used, a signature itself, nonce number or any other unique delegated transaction ID and so on. Regarding the smart contract design, in order to understand why exactly these arguments, read my previous article.

We also want to allow "delegates" to earn something in order to cover their Ether spending, as well as to be profitable. Thus, we have to add a fee, but a much more natural fee than Ether: a fee in the token itself. So that, for example, if you need to transfer 100 tokens, you pay 3 more tokens to the delegate depending on its price and network conditions to perform a transfer, and this should be preserved in a smart contract logic.

Back End

All right, now we have a token that allows transferring someone else's tokens by using their signature. Now, the crucial part is to automate the process of requesting and publishing such transactions. And here where our open-sourced back end (and a front end) kicks in.

Below is the sequence diagram describing how front end (client) communicates with the back end from the delegated transaction request to its publishing to the network:

(hidden on the diagram) The client requests information from the delegated back end to understand which contracts does it support, as well as which functions can it delegate.
The client requests a particular smart contract's function to be delegated. Most importantly, the back end returns the fees it charges and a data to be signed by the client.
The client signs the data in their wallet. Signing is a free operation, unlike publishing transaction to the network.
The client sends their signature back, thus confirming their intent to perform this particular delegated transaction. The back end validates this transaction against the current network.
Finally, the back end publishes a transaction to the network.
(hidden on the diagram) The client constantly polls the back end for the delegated request status until it receives a mined status. Note: it is important to poll the back end instead of using a transaction hash to understand when the transaction is mined. It is a very common case when the gas price suddenly increases, and, in order for the transaction to be mined quickly, the back end may republish it with a higher gas price. Though it is currently not implemented, it is very likely to be implemented soon.

Sequence diagram representing the simplified flow of how delegated transaction is delivered to the network

This approach is universal, and only requires the manifest file for the back end to understand how to calculate fees and which signature standard to use on the client side. Here is another visualization of the components of the system and their interaction sequence:

Component diagram

We've provided a comprehensive documentation for this solution. You can check how the back end API is structured, as well as find the token manifest file which describes how to work with a particular token contract. We encourage you to contribute your own tokens there!

And you don't need much setup: it's already there with the universal front end!

Front End

Open-sourced front end part of the delegated transactions is the user interface which is set up for every token: just run your delegated transactions back end and you are ready to go!

What it looks like

It is made to be an embeddable widget, which will guide the user through the procedure of sending tokens. You can plug any back end, token or call any token function with it by utilizing additional URL parameters you can specify.

Using this widget, and by implementing something similar to widely used, but not standardized approveAndCall function in your token smart contract, you will be able to call other smart contracts with arbitrary data by paying fees in tokens!

Here is a quick guide for you if you want to play with this UI yourself:

Access the widget via this link.
It will ask you to switch to the Kovan test network.
Get some test Ether using any available Kovan faucet.
Use test Ether to mint some test tokens: call mintTokens function in a token smart contract which will give you 10 test tokens.
Now, get back to the widget and try to transfer these tokens!

If you open up the browser's developer tools, you may notice that there are a couple of back ends connected by default — they provide the front end with all required information to make a delegated request according to given widget URL parameters. All backends are requested during the widget load and, if any of them can provide a delegation for a particular contract's function, then the widget requests additional information: fees, supported signatures, etc. If there are multiple back ends which can delegate the same contract function, all of them are requested and the back end which provides the best fee will be used for the transaction.

Transaction mining time is seemingly fixed, but it can vary because of the network conditions. The back end uses an actual network fee when calculating the token fee, however, it may change before the user decides to execute the transaction. Thus, the "underpriced" transaction is submitted to the network and can be pending for a while. While the back end is currently not programmed to deal with this case, it might be implemented in future — transactions will be republished with higher gas fees in case of the network fee increases. But, we will also need to count this into the token fee.

Signature Standards

The last question which you may be wondering is — which signature standard to use for your token. There are several available: _ethsign (deprecated), _ethpersonalSign (note that old Trezor and Ledger produce a different signatures because of ambiguity in a standard, so you may want to include both), _ethsignTypedData (deprecated), _eth_signTypedDatav3 and so on. I would recommend supporting at least two: ageless _ethpersonalSign and new _eth_signTypedDatav3 (as of 2019).

Signature standards comparison — what the user sees

The front end is programmed to always prefer the user-readable standard like eth_signTypedData_v3 to any others eth_personalSign. So if your token supports many signature standards, and you added all of them to the manifest file of your token, it will display eth_signTypedData_v3 prompt first.

Conclusion

Delegated transactions are great: they solve one of the biggest problems of blockchain application adoption, which eases the mass adoption of crypto overall. I will put a couple of thesis in a Q&A format here for you to answer the last questions that you may still have after reading this article:

Our open-source solution is free to use and production-ready, feel free to apply it to your applications or tokens!
The described approach does not compromise security nor centralization. Think this way: the centralized back end is only a helper for someone who wants to transfer tokens without fee in Ether. If the back end is hacked, or it is just unavailable, there's no problem to interact with the network just as it was before, by paying fees in Ether. As well as the back end cannot harm or trick the user to steal their tokens when a proper signature standard is used (it's up to your token implementation).
There is a way to support delegated transactions for existing, already-deployed tokens. However, it requires the additional Ether-consuming step to migrate existing tokens to a new token contract. And, by programming a new token contract properly, as well as designing your application to work with both tokens you can even avoid a need to list a new token on exchanges.
By using the existing tokens as an example, which is available in delegated transactions back end and front end repositories, you can produce your own manifest for your own token.
Read the instructions on how to set up your own back end for a token, and then add it to the URL of your widget (or commit to the open-source repository).
Have a token which already supports delegated transactions? Plug it into our UI with these three quite simple steps: (1) create a manifest for your token and put your token abi file while setting up the delegate back end, (2) run this back end, exposing a public API URL and (3) use URL parameters in a widget to reference your back end or commit it directly to our open-source repository. Read more about it in GitHub's readme file.

I hope that was a really helpful piece of information for all the searchers of incredible. Feel free to contact me or fill the issue here if I missed something. Have fun, let the token economy be simple!

How does tokenization work, anyway?

freeCodeCamp — Sat, 20 Oct 2018 13:06:04 +0000

By Albert Ho

Not everything will be tokenized, but that which can be will be.

2017 saw the hype of utility tokens and ICOs. 2018 marks the hyped start of asset tokenization and the launch of securities tokens/platforms. This trend is notably huge in the U.S, given how A16z and GGV Capital invested in the likes of tokenization platforms like TrustToken, Harbor, and Polymath.

In Asia, the trend is starting to pick up too. For instance, Rate3 Network was funded by various notable Asian VCs that include Matrix Partners China and Fenbushi Capital.

Conceptually, tokenization might seem easy: Issue an ERC-20 token (or any other blockchain tokens), imbue legal rights and ownership rights in the tokens, and you can trade them easily. However, this warrants a much deeper look: how do you distinguish claim rights and ownership rights? What are the differences in tokenization between different asset classes? What is impeding tokenization from adoption?

Thinking comprehensively about tokenization requires an understanding of blockchains and smart contracts, legal, finance, and economics. There has already been in-depth research on each of these components.

In this piece, I want to give a comprehensive introduction to tokenization through using real estate as a primary example. Here’s what we’ll discuss:

What exactly is Tokenization? (Is it not just securitization?)
What are the real benefits of tokenization? (Why do we even bother?)
What are some of the tougher issues to think about? (How do we ask the right questions?)
What are the challenges for tokenization? (What’s stopping adoption?)
How will the tokenized future be? (What are some elements?)

Let’s get started!

What is Tokenization?

Isn’t Tokenization just securitization?

For structured finance professionals, tokenization might seem like securitization. In summary, securitization consists of a few steps:

Originator (owner of the assets) collects the assets in a pool and transfers the pool to a legal entity (usually a special purpose vehicle). Through this legal structure, assets are not exposed to counter-party risks or risk of bankruptcy of the originator bank
The SPV structures the assets within the pool into several tranches, according to different risk levels and characteristics usually. Securities are issued, and backed by the cash flows generated by the underlying assets.
After issuing the securities, the SPV sells the securities to investors, whilst transferring the proceeds to the originator afterwards.

Securitization has various flaws.

The classic securitization process is extremely costly and takes up a lot of time. The entire process might cost up to millions of dollars and takes up to a year. The securitization process requires agreements with various parties under conditions of asymmetric information, as well as a heterogeneous structure of asset data.

Furthermore, there might be a lack of full transparency in the various stages of securitization, all of which hinder auditing and rating of the underlying assets. In the sub-prime mortgage crisis, there is no transparency on the credit pool nor the audit process that lead to defaults on the bonds issued.

Clearly, tokenization =/= securitization

Tokenization — in its simplest definition — refers to converting an asset into a digital token on the blockchain system. The biggest difference between tokenization and securitization, is how programmability is introduced into the tokenized asset. This way, business logic can be introduced, reducing the need for manual settlements. Smart contracts can have functions for automatic transactions, formulas for calculating asset prices and other specific features.

What are the real benefits of Tokenization then?

Photo by @sanfrancisco, Unsplash

Numerous research pieces have talked about various benefits of tokenization, but these various benefits can be categorized into three core principles: 1) liquidity, 2) programmability and 3) immutable proof of ownership.

Key Principle 1: Liquidity

World Economic Forum predicts that over the next ten years, 10% of the world’s GDP will be stored in crypto assets, amounting to $10 trillion. This is primarily due to increased fractional ownership, and unlocking of liquidity premiums.

Assuming no legal and regulatory barriers, tokenization allows for increased fractional ownership. Most tokens can be broken into 18 decimals, as compared to fiat which can be broken down to $0.01 only. Fractional ownership lowers the barriers of entry for new investors. For instance, instead of paying $1 million for a new apartment, I can pay $50,000 for a tokenized fraction of the real estate. For investors, fractional ownership and lower barriers help them to increase portfolio diversification and construct a “truer” market portfolio.

The increase in liquidity helps unlock value for markets through liquidity premiums. When illiquid assets become more liquid, a liquidity premium of approximately ~20–30% is unlocked. One example is real estate: Even a fractional improvement in the sales price of such investments could result in trillions of dollars of new value for issuers and resellers.

Key Principle 2: Programmability built into tokens

Programmability refers to the ability to introduce certain business logic into smart contracts, allowing for automated events to occur. Tokenization can also lead to easier management of investors and their rights. Secondary transactions can be easily tracked by collaborating with third-party exchanges, allowing investors to receive distributions and exercise their other rights (e.g., voting) through the blockchain.

Programmability is especially useful in increasing the speed of settlements. In traditional finance, settlements refer to the process of documentation of the transfer of asset ownership before the ownership of assets actually changes hands. Compliance can be programmed into the tokens, if all participants have a digital identity that has gone through the relevant compliance/KYC/AML checks.

Key Principle 3: Immutable proof of ownership

Blockchains are immutable and keep a public trace of every transfer, and owner. This digital trace of transactions not only proves the history of ownership but also helps to ensure less fraud. The immutable structure makes it impossible for a token-holder to “double-sell” a token — accepting a transfer for the same token to two different sources. This helps assure investors that no one can falsify transactions after the transaction has happened.

Let’s dive deeper into tokenization.

Tokenization is the process of digitally storing the property rights to a thing of value (asset) on a blockchain or distributed ledger, so that ownership can be transferred via the blockchain’s protocol. What are the other challenges?

Issue #1: What are the requirements for tokenization to take place?

There are 3 fundamental requirements:

The rights to an asset can be stored digitally on a blockchain

Let’s go back to the real estate example. If I want to tokenize my house, I must be able to record my ownership of my house on a token itself. This means that to regulatory authorities, holding a token represents an ownership right or claim right on the house itself. (We will go into these rights in a bit.)

2. These rights can be legally transferred via blockchains

Whilst I can document my rights to my house in a legally-recognized way, I should be able to transfer these rights to anyone I want and that person will have legal ownership of my house, assuming my tokens are imbued with ownership rights.

3. Tokens can be easily exchanged for value, giving the assets “value”

Lastly, like any security, I must be able to exchange my real estate token for value easily — so I can subscribe value to the asset.

Issue #2: What are the other legal issues to consider?

Apart from the 3 requirements, what is more crucial to take note is the exact asset you are tokenizing: Does the token represent a claim on the asset or does the token represent actual ownership of the asset itself? Investors and token issuers must think carefully about what exactly a token represents.

The truth is: it depends on what you want to tokenize. Tokenization is flexible. Using real estate as an example again, what can be tokenized could be direct ownership in the real estate (being a partial equity owner), right to rental income, or even the right to use an asset (renting the apartment).

Hence, a token could represent ownership of the underlying real asset, an interest in a debt secured by the asset, an equity interest in a legal entity that owns that asset, or a right to the cash flow from the asset.

There are 3 basic categories of rights to understand.

The rights bestowed by tokenized securities (or security token) can be very complex to understand. However, tokenized securities can include claims to the assets (and usually the resulting cash flows), direct ownership rights, governance rights or a combination of all.

Claim rights: Claims to only certain specific uses (and claims) of the asset
Ownership rights: Equity ownership and control of the asset
Governance: System by which a group of people can come to unified decisions

Let’s illustrate this with real estate again, with a few examples on the token holders’ rights:

Claim rights, but no ownership rights: Token holders are entitled to cash flows from ongoing leases, but they have no ‘equity’ and ‘ownership’ of the underlying real estate
Claim rights, AND ownership rights: Token holders are the ‘owners’ of the underlying real estate with claims to the cash flows. They can make decisions directly: how much to charge for rent, investments made to maintain the real estate, hiring staff and given the proceeds from the sale of the real estate.
Only ownership rights: This example is rarely the case, but it means that token holders are now the ‘equity owners’ of the real estate.

What are the challenges that arise from these different rights?

It is possible that there is a separation of claim rights and ownership rights, and this creates misaligned incentives between both parties.

What if… the tokens have ownership rights for token holders? How do 1,000 token holders make decisions collectively for the best of the assets? Is there a need for delegated voting or decision making?

What if… the tokens only imbue claim rights for token holders? The token issuers (owners) can reduce profits and cash flows to the token holders, by re-investing the profits. This will be to the detriment of token holders who originally look towards the future cash flow.

The smart contract geek might ask: can’t one automate all these logic in smart contracts?

No, smart contracts cannot solve all these issues.

Contracts and smart contracts are incomplete:

Contracts are only enforceable when events and actions can be verified by a third party

This is the long-standing problem of “oracles” in tokenization. There are some events that can be captured in code, but near impossible for any arbiter to determine if they really happened.

For instance, I issued out real estate tokens to holders so they can receive a portion of the rental income. However, it is possible that I do not document down all the rental agreements so token holders do not know what the real amount of rental income is. If this cannot be enforced effectively, there is no rational reason for parties to abide by the smart contract.

2. It is near impossible to write a contract that contains all possible conditions and events, hence achieving “completeness”

The problem with contracts is not what is in them, it is what is not in them. It is very costly and operationally challenging to write down every condition and event. Furthermore, events and conditions change in real life — and contracts have to adapt to these real-life changes.

Given the limitations of smart contracts as inherently incomplete, certain asset types should not be tokenized.

What should not be tokenized?

When the blockchain cannot fully capture the change of ownership of assets

There are some assets in markets where I can sell the physical asset outside of the protocol directly, despite it being tokenized. For instance, I can tokenize real estate and transfer the token (with ownership rights) to you, but it is possible I can also legally sell the same real estate to another.

There are also other cases when I can trade tokens, but have no guarantees that I can verify the authenticity of the underlying asset. In the case of real estate, it is easier to verify, but other examples include gold bars. If it takes a lot of costs and resources to verify the authenticity, tokenization might not be a viable solution.

2. When the use of prices impede the protocol from achieving its objectives

There are some situations in which we don’t want prices to determine who gets what. Sometimes, prices do not capture external societal benefits and costs, and might not be the most equitable way to allocate resources. Examples include social goods, for instance.

3. Sometimes, we just do not want to tokenize some ‘assets’ and rights

For instance, rights to birth certificates or educational records should not be tokenized since they represent a unique right. We do not, and should not tokenize these ‘assets’.

Clearly, there are some asset classes that _should no_t be tokenized, given real-life limitations.

How do you really tokenize?

We have walked through the what and why of tokenization, now let’s talk a little about the how of tokenization.

There are a few categories of assets that have been tokenized:

Fiat currencies

The tokenization of fiat currencies gave rise to stablecoins. Tether is the first example, creating USDT. However, there are inherent challenges with Tether. For a good, updated summary on stablecoins, I suggest this:

Stablecoins are now officially in vogue again
_With seemingly one new project unveiling multi-million-dollar funding every other week, no one will blame you for…_blog.goodaudience.com

Gold

An example of a gold tokenization project is Digix.

Each DGX token is 1:1 gold-backed, and 1 token represent 1 gram of 99.99% gold from London Bullion Market Association-certified refiners, with gold stored in The Safehouse vault. Purchasing 1 DGX token is equivalent to purchasing actual gold itself.

Real Estate

My primary interest lies in real estate, given the analogy between REITs and tokenized real estate. A few interesting examples are how a Manhattan real estate property was most recently tokenized, or how a portion of the St. Regis Aspen is tokenized. In the case of St. Regis Aspen, each Aspen token represents an indirect ownership interest in a common stock of the St. Regis Aspen REIT. According to Elevated Returns, the “REIT provides tax efficient structure while the blockchain provides peer-to-peer investing and cross-border transaction made simpler for investors.”

Clearly, there are many challenges associated with tokenization.

1. Lack of tokenization standards and legal infrastructure

Tokenization is not simply the creation of a token — any Solidity developer can do it. Instead, it’s about the design of the whole system, including understanding the various rights and issues we’ve talked about previously.

How do tokenization standards cater for these issues:

Incentives (claim rights, ownership rights, governance)
Privileges of users and system admins (who operate the token contracts)
Life-cycle management of an asset (issuance, payouts, withdrawal)
Security management
Integration of KYC/AML requirements across different jurisdictions
Integration with exchanges
Interoperability between different public chains

In the case of cross-chain interoperability, we do see different chains with different nascent characteristics. For instance, Ethereum has scalability issues but provides for more complex Turing-complete smart contracts. How about other public blockchain networks like Stellar or IOST or Zilliqa?

How can tokenized assets (in the form of tokens) be interoperable across these different chains?

2. Digital identity that’s globally and legally-recognized

From a regulatory point of view, it is a regulatory nightmare for assets to be issued and transferred across citizens of different legal jurisdictions.

Suppose I am an EU resident looking to tokenize my real estate and the token only imbues claim rights. How do I transfer this token to U.S persons, whilst taking into account their identity, KYC/AML issues, U.S regulations, taxes and all the other issues?

How can I reasonably and easily deal with a verified, attested U.S person in a legally-compliant way for both our national jurisdictions?

3. Tokenization does not mean instant liquidity

Liquidity is the biggest challenge in the security token space and it does not happen organically. History has given us various examples of financial markets and instruments that have not yet achieved significant levels of liquidity. Helping to create liquidity through allowing institutional investors or accredited retail investors — through custodian solutions will be key. Of course, the underlying asset must be useful.

How do we introduce long-term, sustainable solutions for large institutional investors — the market makers — to create and maintain liquidity?

What does a tokenized future look like?

I’m generally bullish on a tokenized future: a fairer, more equitable world with lower barrier to entry and capital requirements for individuals or businesses.

Through capturing value in tokenized assets, we can re-create all the sophistication of the existing financial and operational world we live in, with far less operational costs and complexities. When combining tokenization with reasonably complex business logic enabled by smart contracts, we can represent complex business interactions faithfully and more efficiently.

There will be interoperability, through standardization

ERC 20 for token standards, as an example

If the ecosystem for global assets becomes interoperable, it means we can hold ownership claims to a commercial building, early-stage equity, corporate bonds, a T-bill, a single-family residence, and a decentralized network on the same platform.

Different assets can reference each other contractually and interact in an automated way. It means an increased liquidity for all (tokenized) asset classes.

ERC 725 for Identity, as another

Fabian Vogelstellar — creator of the ERC 20 standard — is leading the front for a unique decentralized identity for “humans, groups, objects and machines”. Quoting directly from the ERC 725 Github itself, “ This identity can hold keys to sign actions (transactions, documents, logins, access, etc), and claims, which are attested from third parties (issuers) and self-attested (#ERC735), as well as a proxy function to act directly on the blockchain”.

You can read more here about ERC 725 here:

ERC: Identity · Issue #725 · ethereum/EIPs
_eip: title: ERC-725 Identity author: Fabian Vogelsteller (@frozeman) discussions-to…_github.com

There are notable projects that have been working on implementing ERC 725 identity contracts. A few examples are: Origin Protocol and Rate3 Network.

Managing Identity with a UI for ERC 725
_At Origin, we’re building a platform for decentralized, peer-to-peer marketplaces. You can imagine a future Airbnb-like…_medium.com Rate3 Cross-Chain Identity Protocol — Identity and Claims (ERC 725, ERC735)
_At Rate3, we initially wanted to build a blockchain-based settlement and clearance network for businesses. We…_medium.com

The future of tokenization is not here (yet), but it will be sooner than we know

We are optimistic and bullish for the future of tokenization and tokenized securities. There are many elements of the envisioned tokenized future that we observe today:

Governments are increasingly partnering with private companies to create infrastructural solutions

One such example is the collaboration between NASDAQ, Monetary Authority of Singapore (Singapore’s Central Bank) and Singapore Exchange (Singapore’s main stock exchange) to develop Delivery versus Payment capabilities for settlement of tokenized assets across different blockchain platforms to improve operational efficiency and reduce settlement risks.

MAS and SGX partner Anquan Deloitte and Nasdaq to harness blockchain technology
_Singapore, 24 August 2018… The Monetary Authority of Singapore (MAS) and Singapore Exchange (SGX) today announced a…_www.mas.gov.sg

Projects have recognized the need for compliance, and are creating solutions that target automated compliance and AML/KYC

We have touched about the need to meld real-world legal requirements into the blockchain space. There are various projects that have been doing these globally:

Harbor: A compliance platform and protocol to ensure tokenized securities comply with existing securities laws at issuance and on every trade, everywhere across the globe.
Rate3 Network: A protocol that handles asset-tokenization and identity management across both Ethereum and Stellar blockchains.
Polymath: A security token platform on which regulatory-compliant tokens can be built

I do notice more blockchain projects building tokenization solutions targeted at different asset classes, different ways of modeling structured finance through issuing both debt and equity tokens, for instance. More importantly, these solutions know that working directly with regulatory authorities, collaborating with central banks and other projects will help to improve the overall ecosystem.

Ensuring the legally-compliant design of the whole system is key.

3. “Paths of least resistance” will help everyone relate existing real examples to upcoming tokenization projects

Real estate have always been quoted as an example for tokenization projects. This is due to the structure of real estate investment trusts (REITs), that one could relate more easily to tokenized structures.

Tokenized real estate is not REITs, but there are various principles we can use to help us understand, relate and think better: property rights, economics for REITs for instance.

Not everything will be tokenized, but those that can be will be.

Disclosure: I work at Rate3 Network, a dual-protocol that handles asset-tokenization and identity management across both Ethereum and Stellar blockchains.