<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/"
    xmlns:atom="http://www.w3.org/2005/Atom" xmlns:media="http://search.yahoo.com/mrss/" version="2.0">
    <channel>
        
        <title>
            <![CDATA[ Amr Hesham - freeCodeCamp.org ]]>
        </title>
        <description>
            <![CDATA[ Browse thousands of programming tutorials written by experts. Learn Web Development, Data Science, DevOps, Security, and get developer career advice. ]]>
        </description>
        <link>https://www.freecodecamp.org/news/</link>
        <image>
            <url>https://cdn.freecodecamp.org/universal/favicons/favicon.png</url>
            <title>
                <![CDATA[ Amr Hesham - freeCodeCamp.org ]]>
            </title>
            <link>https://www.freecodecamp.org/news/</link>
        </image>
        <generator>Eleventy</generator>
        <lastBuildDate>Sun, 24 May 2026 11:06:28 +0000</lastBuildDate>
        <atom:link href="https://www.freecodecamp.org/news/author/AmrDeveloper/rss.xml" rel="self" type="application/rss+xml" />
        <ttl>60</ttl>
        
            <item>
                <title>
                    <![CDATA[ How to Run SQL-Like Queries on C/C++ Files ]]>
                </title>
                <description>
                    <![CDATA[ Hello everyone! I'm a Software engineer who's interested in low-level programming, compilers, and tool development. At the end of 2023, I published my first article on freeCodeCamp about how I created a SQL-like Language to run queries on local Git r... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/run-sql-like-queries-on-cplusplus-files/</link>
                <guid isPermaLink="false">66d45d9ba44b8bb91150f657</guid>
                
                    <category>
                        <![CDATA[ C++ ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Amr Hesham ]]>
                </dc:creator>
                <pubDate>Thu, 02 May 2024 19:35:48 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/04/gitql_banner-1.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Hello everyone! I'm a Software engineer who's interested in low-level programming, compilers, and tool development.</p>
<p>At the end of 2023, I published my first article on freeCodeCamp about how <a target="_blank" href="https://www.freecodecamp.org/news/gql-design-and-implementation/">I created a SQL-like Language to run queries on local Git repositories</a>. If you want a bit more context, give it a read.</p>
<p>At the start of 2024, the project got bigger and bigger with more features and amazing contributors, and I started to think: what if I could run SQL-like queries not only on .git files but on any kind of local and remote data?</p>
<p>In my last article about <a target="_blank" href="https://www.freecodecamp.org/news/how-to-run-sql-like-queries-on-files/">How to Run SQL-Like Queries on Files</a>, I explained the internal design of the GitQL SDK components and how to use it with any kind of data in general and how to implement the FileQL project.</p>
<p>In this article, I will explain how I used the GitQL SDK to implement the ClangQL (Clang Query Language) project, which is a tool that helps you run SQL-like queries on local C/C++ files.</p>
<h2 id="heading-how-i-came-up-with-the-clangql-project">How I Came Up with the ClangQL Project</h2>
<p>As I mentioned in my past articles, GitQL SDK can run SQL-like queries on any local or remote structured data. Also, the compiler parses your code into an AST (Abstract Syntax Tree) Data structure. So the question that jumped into my mind was, why not run the query on the Abstract Syntax Tree?</p>
<p>There were no limitations I could think of for implementing this idea, so I started to think of the two main requirements for using GitQL: creating the Data Schema to describe the table structures and columns types, and implementing the Data Provider component to provide the data which in our case is the ATS information and mapping it to the Engine format.</p>
<h3 id="heading-the-data-schema-for-the-cc-code">The Data Schema for the C/C++ Code</h3>
<p>You can think of the Data Schema as the place where we put structure and relationships of our data – for example, which tables we have, and for each table what columns they contain, and finally the types of each column.</p>
<p>This information is very useful when you're performing type checking and detecting if the user has written the wrong column name, for example, which is not defined in the selected table they want to use.</p>
<p>In our case, the tables can be classes, structs, enumerations, functions, variables and any other data that can be read from C++ such as macros and so on. But I decided to start simple with functions and variables only, then I planned to add other kinds.</p>
<p>So for the functions table, let's define what columns we need to include. The columns and types are not hard to guess, so let's take a normal function as an example. It has the name <code>Text</code>, and it returns type as <code>Text</code>, the number of parameters as <code>Int</code>, other C++ flags as Booleans (for example, is it a virtual function <code>is_virtual</code> or a pure virtual function <code>is_pure_virtual</code>?), and another flag to tell you if it is a static function <code>is_static</code>.</p>
<p>So to create a Data Schema you need to define two things: what tables you have, and what columns there are in this table. For example, in the functions table it will look like this:</p>
<pre><code class="lang-rust">lazy_static! {
    <span class="hljs-keyword">pub</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">ref</span> TABLES_FIELDS_NAMES: HashMap&lt;&amp;<span class="hljs-symbol">'static</span> <span class="hljs-built_in">str</span>, <span class="hljs-built_in">Vec</span>&lt;&amp;<span class="hljs-symbol">'static</span> <span class="hljs-built_in">str</span>&gt;&gt; = {
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> map = HashMap::new();
        map.insert(
            <span class="hljs-string">"functions"</span>,
            <span class="hljs-built_in">vec!</span>[
                <span class="hljs-string">"name"</span>,
                <span class="hljs-string">"signature"</span>,
                <span class="hljs-string">"args_count"</span>,
                <span class="hljs-string">"return_type"</span>,
                <span class="hljs-string">"class_name"</span>,
                <span class="hljs-string">"is_method"</span>,
                <span class="hljs-string">"is_virtual"</span>,
                <span class="hljs-string">"is_pure_virtual"</span>,
                <span class="hljs-string">"is_static"</span>,
                <span class="hljs-string">"is_const"</span>,
                <span class="hljs-string">"has_template"</span>,
                <span class="hljs-string">"access_modifier"</span>,
                <span class="hljs-string">"is_variadic"</span>,
                <span class="hljs-string">"file"</span>,
                <span class="hljs-string">"line"</span>,
                <span class="hljs-string">"column"</span>,
                <span class="hljs-string">"offset"</span>,
            ],
        );
    }
}
</code></pre>
<p>You also need to define the expected data type for each column:</p>
<pre><code class="lang-rust">lazy_static! {
    <span class="hljs-keyword">pub</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">ref</span> TABLES_FIELDS_TYPES: HashMap&lt;&amp;<span class="hljs-symbol">'static</span> <span class="hljs-built_in">str</span>, DataType&gt; = {
        <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> map = HashMap::new();
        map.insert(<span class="hljs-string">"name"</span>, DataType::Text);
        map.insert(<span class="hljs-string">"type"</span>, DataType::Text);
        map.insert(<span class="hljs-string">"signature"</span>, DataType::Text);
        map.insert(<span class="hljs-string">"args_count"</span>, DataType::Integer);
        map.insert(<span class="hljs-string">"return_type"</span>, DataType::Text);
        map.insert(<span class="hljs-string">"class_name"</span>, DataType::Text);
        map.insert(<span class="hljs-string">"is_method"</span>, DataType::Boolean);
        map.insert(<span class="hljs-string">"is_virtual"</span>, DataType::Boolean);
        map.insert(<span class="hljs-string">"is_pure_virtual"</span>, DataType::Boolean);
        map.insert(<span class="hljs-string">"is_static"</span>, DataType::Boolean);
        map.insert(<span class="hljs-string">"is_const"</span>, DataType::Boolean);
        map.insert(<span class="hljs-string">"has_template"</span>, DataType::Boolean);
        map.insert(<span class="hljs-string">"access_modifier"</span>, DataType::Integer);
        map.insert(<span class="hljs-string">"is_variadic"</span>, DataType::Boolean);
        map
    };
}
</code></pre>
<p>Now let's move on to the most exciting part: the Data Provider.</p>
<h3 id="heading-the-data-provider-for-the-cc-code">The Data Provider for the C/C++ Code</h3>
<p>The data provider component is used to tell the engine how to load the target data – for example from where and on which thread – and provide these data in a format that is known by our GitQL Engine. So how we can extract that information from our C/C++ code?</p>
<p>Well, we need to get the AST after parsing the C/C++ code. So the first option is to write a C/C++ parser to parse the files and provide the AST. But this option has some problems here: it'll require a lot of hard work, as C++ is a large language. To write a parser from scratch means you need to support every new feature, and handle errors, and so on.</p>
<p>The other option is to take a well-written C/C++ parser from any Compiler that provides the parser as a library and use it to provide the AST. After some searching, I found that the Clang Compiler is well-designed and can provide the parser as a library to use it to build other tools such as code formatter and linter.</p>
<p><strong>LibClang</strong> is written in C++ so I used binding for the Rust Programming language to parse the source file as a <a target="_blank" href="https://en.wikipedia.org/wiki/Translation_unit_%28programming%29"><strong>TranslationUnit</strong></a>. This is the parent node that contains information about classes, functions, and so on.</p>
<p>LibClang provides more than one way to visit the <a target="_blank" href="https://en.wikipedia.org/wiki/Translation_unit_%28programming%29">TranslationUnit</a> and all of the children of it. One of them is using the <code>clang_visitChildren</code> function. It takes a function pointer that gives you the Node and its parent and returns the flag as <code>int</code>. Using this flag, you can control if you want to break, continue, or walk inside this node using the return type.</p>
<p>For example if you are visiting the Class or Struct node and want to visit the methods inside them, you need to return <code>CXChildVisit_Recurse</code> – and <code>clang_visitChildren</code> will provide the methods for you. But if you want to just read class info then you need to return <code>CXChildVisit_Continue</code> to continue to other nodes. Using those flags in the wrong way can lead to performance issues and visiting many nodes that aren't useful.</p>
<p>So to get a function's info, we need to call <code>clang_visitChildren</code> as we pass a pointer to our data to save the information we got. For example:</p>
<pre><code class="lang-python">let mut functions: Vec&lt;FunctionNode&gt; = Vec::new();
let data = &amp;mut functions <span class="hljs-keyword">as</span> *mut Vec&lt;FunctionNode&gt; <span class="hljs-keyword">as</span> *mut c_void;

let cursor = clang_getTranslationUnitCursor(translation_unit);
clang_visitChildren(cursor, visit_children, data);
</code></pre>
<p>We passed <code>visit_children</code> that point to the function that extracts the C/C++ function's information. It will look like this:</p>
<pre><code class="lang-python">extern <span class="hljs-string">"C"</span> fn visit_children(
    cursor: CXCursor,
    parent: CXCursor,
    data: *mut c_void,
) -&gt; CXChildVisitResult {

    let cursor_kind = clang_getCursorKind(cursor);
    <span class="hljs-keyword">if</span> cursor_kind == CXCursor_FunctionDecl
        || cursor_kind == CXCursor_CXXMethod
        || cursor_kind == CXCursor_FunctionTemplate
    {
        let function_name = clang_getCursorSpelling(cursor);
        let function_type = clang_getCursorType(cursor);
        let result_type = clang_getResultType(function_type);
        let arguments_count = clang_getNumArgTypes(function_type);

        // ... Extracing more <span class="hljs-keyword">and</span> more information

        <span class="hljs-keyword">return</span> CXChildVisit_Continue
    }

    CXChildVisit_Recurse
}
</code></pre>
<p>Also, if you want to refactor or build advanced searching tools on top of ClangQL, you'll need to get the source code location. For example, where exactly does the function you're searching for exist – on which file and line?</p>
<p>So to get them from Clang, you can use the below code. It provides the file name, line, column and offset data of the selected node:</p>
<pre><code class="lang-python">let cursor_location = clang_getCursorLocation(cursor);

let mut file: CXFile = std::ptr::null_mut();
let mut line: u32 = <span class="hljs-number">0</span>;
let mut column: u32 = <span class="hljs-number">0</span>;
let mut offset: u32 = <span class="hljs-number">0</span>;

clang_getFileLocation(
    cursor_location,
    &amp;mut file,
    &amp;mut line,
    &amp;mut column,
    &amp;mut offset,
);

let file_name = clang_getFileName(file);
let file_name_str = CStr::from_ptr(clang_getCString(file_name)).to_string_lossy();
</code></pre>
<p>The source code of <code>visit_children</code> is too large to include because, as you can see, the function node contains a lot of information. So you can check the full and updated code for all visitors from this file in the ClangQL repository: <a target="_blank" href="https://github.com/AmrDeveloper/ClangQL/tree/master/src/visitor">DataProviderFile</a>.</p>
<p>The LibClang creators provide clear <a target="_blank" href="https://clang.llvm.org/docs/LibClang.html">documentation</a> on how to walk through the Translation Unit and extract the needed data.</p>
<p>So now we have our Data Schema and Provider, and we can perform a query like <code>SELECT * FROM functions</code>. The result will be likes this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/04/clangql_demo.png" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>The result of running a query to select all function information from one file</em></p>
<p>So after that I decided to name the project ClangQL which stands for Clang Query Language. Now I'm working on extracting more and more important information from the AST (feel free to contribute).</p>
<p>You can find the full source code with all customizations in the <a target="_blank" href="https://github.com/AmrDeveloper/ClangQL">ClangQL repository</a>.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You can check out the <a target="_blank" href="https://github.com/AmrDeveloper/ClangQL">ClangQL</a> project as a full sample created only in three files.</p>
<p>If you liked the project, you could give it a star ⭐ on <a target="_blank" href="https://github.com/AmrDeveloper/GQL">GitQL</a> and <a target="_blank" href="https://github.com/AmrDeveloper/ClangQL">ClangQL</a>.</p>
<p>You can check out the <a target="_blank" href="https://amrdeveloper.github.io/GQL/">website</a> for how to download and use the project on different operating systems.</p>
<p>The project is not done yet – this is just the start. Everyone is welcome to join and contribute to the project and suggest ideas or report bugs.</p>
<p>You can sponsor my work on <a target="_blank" href="https://github.com/sponsors/AmrDeveloper">GitHub</a> ❤️.</p>
<p>Thanks for reading</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How to Run SQL-Like Queries on Files ]]>
                </title>
                <description>
                    <![CDATA[ Hello everyone! I'm a Software engineer who is interested in low-level programming, compilers, and tool development. At the end of 2023, I published my first article on freeCodeCamp about how I created a SQL-like Language to run queries on local Git ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/how-to-run-sql-like-queries-on-files/</link>
                <guid isPermaLink="false">66d45d99cc7f04d2549a371c</guid>
                
                    <category>
                        <![CDATA[ Git ]]>
                    </category>
                
                    <category>
                        <![CDATA[ SQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Amr Hesham ]]>
                </dc:creator>
                <pubDate>Tue, 12 Mar 2024 12:33:46 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2024/03/fileql_banner.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Hello everyone! I'm a Software engineer who is interested in low-level programming, compilers, and tool development.</p>
<p>At the end of 2023, I published my first article on freeCodeCamp about how <a target="_blank" href="https://www.freecodecamp.org/news/gql-design-and-implementation/">I created a SQL-like Language to run queries on local Git repositories</a>. If you want a bit more context, give it a read.</p>
<p>At the start of 2024, the project got bigger and bigger with more features and amazing contributors, and I started to think: what if I could run SQL-like queries not only on .git files but on any kind of local and remote data?</p>
<p>In this article, I will take you on a journey of updating the design of the GitQL project to be used also as an SDK. I will also explain how I used it to implement the FileQL project, which is a tool to run the SQL-like query on local files.</p>
<h2 id="heading-the-first-use-case-for-this-idea">The First Use Case for this Idea</h2>
<p>My first idea was to be able to use the same features of GitQL to build FileQL, which is a tool that allows you to run queries on a local file system.</p>
<p>Following that, everyone can use the GitQL project as an SDK to build their XQL. For example, LogQL, WeatherQL, CodeQL, AudioQL, BookQL, and so on.</p>
<h2 id="heading-how-i-started-to-think-about-the-gitql-sdk">How I Started to Think About the GitQL SDK</h2>
<p>The first question was: what can be a different between GitQL and FileQL? This part could be dynamic depending on the data format and how to read them.</p>
<p>The answer was two components. Let's go over them in the following sections.</p>
<h3 id="heading-the-first-component-is-the-data-schema">The first component is the Data Schema</h3>
<p>In each SQL-like query, we need to perform some checks to make sure that everything is valid. For example, in a query like <code>SELECT UPPER(name), commit_count + 1 FROM branches</code>, we need to perform the following checks:</p>
<ul>
<li><p>Check that there is a table with name branches.</p>
</li>
<li><p>The field <code>name</code> has the type of text so it can passed to the function <code>UPPER</code> without any problems.</p>
</li>
<li><p>The field <code>commit_count</code> has type the type of integer, so that we can use it with the plus operator and another integer.</p>
</li>
</ul>
<p>These checks can be implemented if we are aware of the table names, field names, and types. This information was static in the GitQL project, but now, when I want to convert it to an SDK, I need to make it dynamic so any SDK user can modify it depending on their own data.</p>
<p>So, I encapsulated all the needed info in a component called DataSchema, and once the user passes it to the SDK, all checks will work correctly.</p>
<h3 id="heading-the-second-component-is-the-data-provider">The second component is the Data Provider</h3>
<p>Once we have defined the DataSchema component to make it easier to perform checks on data, we have to move to the next question: how can we provide the data to the GitQL Engine?</p>
<p>In GitQL, we have static functions to provide the data from .git files, but in the SDK, we don't only work with .git files, and we should support working with any kind of data.</p>
<p>So, the idea is to define an interface between the GitQL Engine and the SDK user to provide any kind of data in the needed format for the Engine. This component is called DataProvider, and I will explain the implementation details in the next section.</p>
<h2 id="heading-the-design-and-implementation-of-the-gitql-sdk">The Design and Implementation of the GitQL SDK</h2>
<p>The goal is to allow the SDK user to pass their own definition of Data Schema and Provider and integrate them easily with the other GitQL components such as Tokenizer, Parser, Checker, Functions, and Engine.</p>
<h3 id="heading-how-to-design-the-data-schema">How to design the Data Schema</h3>
<p>The data schema should contain two kinds of information. Firstly, it should define the correct tables and field names, and secondly, it should specify the data types for those fields.</p>
<p>For example, in the case of FileQL, the correct table and field names are:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">ref</span> TABLES_FIELDS_NAMES: HashMap&lt;&amp;<span class="hljs-symbol">'static</span> <span class="hljs-built_in">str</span>, <span class="hljs-built_in">Vec</span>&lt;&amp;<span class="hljs-symbol">'static</span> <span class="hljs-built_in">str</span>&gt;&gt; = {
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> map = HashMap::new();
    map.insert(
        <span class="hljs-string">"files"</span>,
        <span class="hljs-built_in">vec!</span>[<span class="hljs-string">"path"</span>, <span class="hljs-string">"parent"</span>, <span class="hljs-string">"extension"</span>, <span class="hljs-string">"is_dir"</span>, <span class="hljs-string">"is_file"</span>, <span class="hljs-string">"size"</span>],
    );
    map
};
</code></pre>
<p>Here, we define only one table called <code>files</code>, which has six fields: <code>path</code>, <code>parent</code>, <code>extension</code>, <code>is_dir</code>, <code>is_file</code>, and <code>size</code>.</p>
<p>In the other map, we define the correct data type for each field. For example:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">pub</span> <span class="hljs-keyword">static</span> <span class="hljs-keyword">ref</span> TABLES_FIELDS_TYPES: HashMap&lt;&amp;<span class="hljs-symbol">'static</span> <span class="hljs-built_in">str</span>, DataType&gt; = {
    <span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> map = HashMap::new();
    map.insert(<span class="hljs-string">"path"</span>, DataType::Text);
    map.insert(<span class="hljs-string">"parent"</span>, DataType::Text);
    map.insert(<span class="hljs-string">"extension"</span>, DataType::Text);
    map.insert(<span class="hljs-string">"is_dir"</span>, DataType::Boolean);
    map.insert(<span class="hljs-string">"is_file"</span>, DataType::Boolean);
    map.insert(<span class="hljs-string">"size"</span>, DataType::Integer);
    map
};
</code></pre>
<p>Then, we create an instance of <code>Schema</code>, and construct it using the two maps. It should pass them to the Data Schema instance list like this:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">let</span> schema = Schema {
    tables_fields_names: TABLES_FIELDS_NAMES.to_owned(),
    tables_fields_types: TABLES_FIELDS_TYPES.to_owned(),
};
</code></pre>
<h3 id="heading-how-to-design-the-data-provider">How to design the Data Provider</h3>
<p>The goal of the Data Provider component is to load the data and map them into the GitQL Engine object structure, so we can define it as an interface with a single function:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">pub</span> <span class="hljs-class"><span class="hljs-keyword">trait</span> <span class="hljs-title">DataProvider</span></span> {
    <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">provide</span></span>(
        &amp;<span class="hljs-keyword">self</span>,
        env: &amp;<span class="hljs-keyword">mut</span> Environment,
        table: &amp;<span class="hljs-built_in">str</span>,
        fields_names: &amp;[<span class="hljs-built_in">String</span>],
        titles: &amp;[<span class="hljs-built_in">String</span>],
        fields_values: &amp;[<span class="hljs-built_in">Box</span>&lt;<span class="hljs-keyword">dyn</span> Expression&gt;],
    ) -&gt; GitQLObject;
}
</code></pre>
<p>The SDK user can implement this interface for their own kind of data and make it work with different data.</p>
<p>Also, you can control how many threads you need and what extra parameters you want. For example, in FileQL I implemented it with the name <code>FileDataProvider</code>, and passed the base path to search as parameter.</p>
<p>You can also implement it in any way. For example, <code>APIDataprovider</code>, and load the data from server and map them into <code>GitQLObject</code>. You could also implement is as <code>LogDataProvider</code>, and so on. The main idea is the same – just provide the data to the engine.</p>
<h3 id="heading-how-to-use-the-sdk-components-together">How to use the SDK Components together</h3>
<p>The GitQL SDK has four main components, and each one can be used for many purposes. However, all of them can be used and integrated easily with each other to run the SQL-like query on your data.</p>
<p>First of all, there is the GitQL CLI component, which contains the required functions to deal with the command line interface, such as the arguments parser, diagnostic reporter, and table render.</p>
<p>Next, there is the GitQL AST component. This component contains the required structures for the SDK, such as the AST nodes, functions, schema, data types, and values.</p>
<p>There is also the GitQL Parser component, which is used to perform lexical, syntax, and semantic analysis on the query. It takes the SQL-like query as a string. If everything is correct, it returns an AST node. Otherwise, it returns a Compile time error message as a string.</p>
<p>Lastly, there is the GitQL Engine component. The Engine component contains the Engine and DataProvider, so it takes your implementation of the DataProvider and the AST and evaluates each node on the data. In the end, it returns the data as a result or a runtime error as a string.</p>
<p>After adding the GitQL SDK crates to your project and configuring the Data Schema and Provider for your data, we can start using the GitQL SDK:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> env = Environment::new(schema);
<span class="hljs-keyword">let</span> query = ...;

<span class="hljs-keyword">let</span> <span class="hljs-keyword">mut</span> reporter = DiagnosticReporter::default();
<span class="hljs-keyword">let</span> tokenizer_result = tokenizer::tokenize(query.to_owned());
<span class="hljs-keyword">let</span> tokens = tokenizer_result.ok().unwrap();
<span class="hljs-keyword">if</span> tokens.is_empty() {
    <span class="hljs-keyword">return</span>;
}

<span class="hljs-keyword">let</span> parser_result = parser::parse_gql(tokens, &amp;<span class="hljs-keyword">mut</span> env);
<span class="hljs-keyword">if</span> parser_result.is_err() {
    <span class="hljs-keyword">let</span> diagnostic = parser_result.err().unwrap();
    reporter.report_diagnostic(&amp;query, *diagnostic);
    <span class="hljs-keyword">return</span>;
}

<span class="hljs-keyword">let</span> query_node = parser_result.ok().unwrap();
<span class="hljs-keyword">let</span> provider: <span class="hljs-built_in">Box</span>&lt;<span class="hljs-keyword">dyn</span> DataProvider&gt; = <span class="hljs-built_in">Box</span>::new(FileDataProvider::new(base_path.to_owned()));
<span class="hljs-keyword">let</span> evaluation_result = engine::evaluate(&amp;<span class="hljs-keyword">mut</span> env, &amp;provider, query_node);
</code></pre>
<p>The code above takes the query as a string and processes it until getting the evaluation result from the engine:</p>
<ul>
<li><p>Create an Environment instance using the <code>DataSchema</code> to track types.</p>
</li>
<li><p>Create an instance of <code>DiagnosticEngine</code> to use it for error reporting.</p>
</li>
<li><p>Pass the query to the tokenizer to convert the string into a list of tokens.</p>
</li>
<li><p>Pass the list of tokens to the parser to convert it to <code>TreeDataStructure</code>.</p>
</li>
<li><p>Create an instance of your <code>DataProvider</code> and pass it with the tree to the engine.</p>
</li>
<li><p>The engine returns the evaluation result which is an error or data.</p>
</li>
</ul>
<p>Those components are not new at all, besides Data Schema and Provider, and you can enjoy reading about the design and implementation details in the <a target="_blank" href="https://www.freecodecamp.org/news/gql-design-and-implementation/">first article</a>.</p>
<p>This is almost all you need to make the project work, but you can add more customization and extra components, such as CLI arguments. The final result will be like this:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2024/02/fql_demo.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<p><em>Demo for FileQL project running on local files</em></p>
<p>You can find the full source code with all customizations in the <a target="_blank" href="https://github.com/AmrDeveloper/FileQL">FileQL repository</a>.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>You can check the <a target="_blank" href="https://github.com/AmrDeveloper/FileQL">FileQL</a> project as a full sample created only in three files.</p>
<p>If you liked the project, you could give it a star ⭐ on <a target="_blank" href="https://github.com/AmrDeveloper/GQL">GitQL</a> and <a target="_blank" href="https://github.com/AmrDeveloper/FileQL">FileQL</a></p>
<p>You can check the <a target="_blank" href="https://amrdeveloper.github.io/GQL/">website</a> for how to download and use the project on different operating systems.</p>
<p>The project is not done yet – this is just the start. Everyone is welcome to join and contribute to the project and suggest ideas or report bugs.</p>
<p>Thanks for reading!</p>
 ]]>
                </content:encoded>
            </item>
        
            <item>
                <title>
                    <![CDATA[ How I Created a SQL-like Language to Run Queries on Local Git Repositories ]]>
                </title>
                <description>
                    <![CDATA[ Hello everyone! I'm a Software engineer who's interested in low-level programming, compilers, and tool development. Three months ago I decided to learn the Rust programming language and build a Git client that focuses on simplicity and productivity. ... ]]>
                </description>
                <link>https://www.freecodecamp.org/news/gql-design-and-implementation/</link>
                <guid isPermaLink="false">66d45d9773634435aafcef62</guid>
                
                    <category>
                        <![CDATA[ Git ]]>
                    </category>
                
                    <category>
                        <![CDATA[ SQL ]]>
                    </category>
                
                <dc:creator>
                    <![CDATA[ Amr Hesham ]]>
                </dc:creator>
                <pubDate>Thu, 26 Oct 2023 17:00:00 +0000</pubDate>
                <media:content url="https://www.freecodecamp.org/news/content/images/2023/10/gitql_banner.png" medium="image" />
                <content:encoded>
                    <![CDATA[ <p>Hello everyone! I'm a Software engineer who's interested in low-level programming, compilers, and tool development.</p>
<p>Three months ago I decided to learn the Rust programming language and build a Git client that focuses on simplicity and productivity.</p>
<p>‌I started to think about how I could build the Git client to provide some unique and useful features.</p>
<p>For example, I like the analysis page on GitHub that tells you how many commits each developer has made and how many lines they've inserted or deleted. But what if I want to get this analysis for some period of time, or order everything by inserted lines and not number of commits? Or order them by how many commits were made by week or month?</p>
<p>You can add a custom sorting option for the client, right? But I started thinking about how I could make it more dynamic. This motivated me to wonder if I could run SQL-like queries on the local .git files so I could query any information I wanted.</p>
<p>So imagine if you could run a query like this on your local git repositories:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">name</span>, <span class="hljs-keyword">COUNT</span>(<span class="hljs-keyword">name</span>) <span class="hljs-keyword">AS</span> commit_num <span class="hljs-keyword">FROM</span> commits <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">BY</span> <span class="hljs-keyword">name</span> <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> commit_num <span class="hljs-keyword">DESC</span> <span class="hljs-keyword">LIMIT</span> <span class="hljs-number">10</span>
</code></pre>
<p>I have implemented this idea with a project I made called <strong>GQL</strong> (Git Query Language). And in this article, I'm going to show you how I designed and implemented the functionality.</p>
<h2 id="heading-how-can-you-take-a-sql-like-query-and-run-it-on-git-files">How Can You Take a SQL-like Query and Run it on .git Files?</h2>
<p>The first idea I had was to use SQLite. But there were some problems I couldn't resolve.</p>
<p>For example, I couldn't customize the syntax, and I didn't want to read .git files and store them on a SQLite database and then perform the query. I wanted everything to run on the fly.</p>
<p>I also wanted to be able to use not only the SELECT, DELETE, and UPDATE commands but also provide commands related to Git like <code>push</code>, <code>pull</code>, and so on.</p>
<p>I've created different tools like compilers before, so why not create a SQL-like language from scratch and make it perform queries on the fly and see if it works?</p>
<h2 id="heading-how-i-designed-and-implemented-a-query-language-from-scratch">How I Designed and Implemented a Query Language from Scratch</h2>
<p>I wanted to start small by only supporting the <code>SELECT</code> command without advanced features such as aggregations, grouping, joining, and so on.</p>
<p>So I planned to parse the query into a data structure that would make it easy to perform validation and evaluation on it (like type checking and displaying helpful error messages if anything went wrong). After that, I would pass this data structure to the evaluator that would apply the query on my .git files.</p>
<h3 id="heading-choosing-a-data-structure-to-use">Choosing a data structure to use</h3>
<p>The best data structure for this case is to represent the query using an <a target="_blank" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">A<strong>bstract Syntax Tree</strong></a> (AST). This is a very common data structure used in compilers because it's fixable and make it easy to traverse and compose nodes inside others.</p>
<p>Also in this case, I didn't need to keep all the information about the query, only the information that needed for the next steps (this is why it's called Abstract).</p>
<h3 id="heading-deciding-what-validation-to-perform">Deciding what validation to perform</h3>
<p>The most important validation in this case would be type checking to make sure each value is valid and used in the correct place.</p>
<p>For example, what if the query wanted to multiply text by other text – would this be valid?</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-string">"ONE"</span> * <span class="hljs-string">"TWO"</span>
</code></pre>
<p>The multiplication operator expects both sides to be a number. So in this case, I wanted to inform the user that their query is invalid and try to help them understand the problem as much as possible.</p>
<p>So how would that work? When I see an operator like <code>*</code>, you need to check both sides to see if the values are valid types for this operator or not. If not then, report a message like this:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-string">"ONE"</span> * <span class="hljs-string">"TWO"</span>
<span class="hljs-comment">-------------^</span>

<span class="hljs-keyword">ERROR</span>: <span class="hljs-keyword">Operator</span> <span class="hljs-string">`*`</span> expects <span class="hljs-keyword">both</span> sides <span class="hljs-keyword">to</span> be <span class="hljs-built_in">Number</span> <span class="hljs-keyword">type</span> but got Text.
</code></pre>
<p>Beside operators, I knew that I needed to check whether each identifier was a table, field, alias of a function name, or if it should be undefined. I also needed to report an error if, for example, a branches table contained only 2 fields like the example below:</p>
<pre><code class="lang-sql">Branches {
   Text name,
   Number commit_count,
}
</code></pre>
<p>So I created a table that contained representations for all tables and fields so I could easily perform type checking. If the user tried to select a field which was undefined in this schema, then it reported an error:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> invalid_field_name <span class="hljs-keyword">FROM</span> branches
<span class="hljs-comment">-------------^</span>

<span class="hljs-keyword">Error</span>: <span class="hljs-keyword">Field</span> <span class="hljs-string">`invalid_field_name`</span> <span class="hljs-keyword">is</span> <span class="hljs-keyword">not</span> defined <span class="hljs-keyword">in</span> branches table.
</code></pre>
<p>I had to make sure the same checks would be performed on conditions, function names, and arguments. Then, if everything was properly defined and had the correct types, the AST would be valid and we could go to the next step.</p>
<h3 id="heading-what-happens-after-validating-the-abstract-syntax-tree">What happens after validating the Abstract Syntax Tree?</h3>
<p>After making sure everything was valid, it was time to evaluate the query and how it fetched the result.</p>
<p>To do that, I just traversed the syntax tree and evaluated each node. After finishing, I should have the correct result in a list.</p>
<p>Let's go through that process step by step to see how it works.</p>
<p>For example, in a query like this:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> branches WHEER <span class="hljs-keyword">name</span> <span class="hljs-keyword">LIKE</span> <span class="hljs-string">"%/main"</span> <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> commit_count LIMIE <span class="hljs-keyword">BY</span> <span class="hljs-number">5</span>
</code></pre>
<p>The AST representation will look like this:</p>
<pre><code class="lang-python">AbstractSyntaxTree {
  Select(*, <span class="hljs-string">"branches"</span>) 
  Where(Like(name, <span class="hljs-string">"%/main"</span>))
  OrderBy(commit_count)
  Limit(<span class="hljs-number">5</span>) 
}
</code></pre>
<p>Now we need to traverse and evaluate each node but in a specific order. We don't just go start to end or end to start because we need to do this in the same order that SQL would do it to get the same result.</p>
<p>For example in SQL, the <code>WHERE</code> statement must be executed before <code>GROUP BY</code>, and <code>HAVING</code> must be executed after.</p>
<p>In the above example, everything is in the correct order to execute, so let's see what each statement will do.</p>
<ul>
<li><code>Select(*, "branches")</code></li>
</ul>
<p>This will select all the fields from the table with the name <code>branches</code> and push them to a list – let's call it <code>objects</code>. But how can I select them from the local repository?</p>
<p>All information about commits, branches, tags, and so on is stored by Git on files inside a folder called <code>.git</code> in each repository. One option is to write a full parser from scratch to extract the needed information. But using a library to do this instead worked for me.</p>
<p>I decided to use the libgit2 library to perform this task. It's a pure C implementation of the Git core methods, so you can read all the information you need and to use it from Rust. There is a crate (Rust Library) created by the Rust official team called <code>git2</code>, so you can get the branch information easily like this:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">let</span> local_branches = repo.branches(<span class="hljs-literal">Some</span>(BranchType::Local));
<span class="hljs-keyword">let</span> remote_branches = repo.branches(<span class="hljs-literal">Some</span>(BranchType::Remote));
<span class="hljs-keyword">let</span> local_and_remote_branches = repository.branches(<span class="hljs-literal">None</span>);
</code></pre>
<p>and then iterate over each branch to get its information and store it like this:</p>
<pre><code class="lang-rust"><span class="hljs-keyword">for</span> branch <span class="hljs-keyword">in</span> local_and_remote_branches {
   <span class="hljs-comment">// Extract information from branch and store it</span>
}
</code></pre>
<p>Now we end up with list of all branches that we'll use in the next steps.</p>
<ul>
<li><code>Where(Like(name, "%/main"))</code></li>
</ul>
<p>This will filter the objects list and remove all items that do not match the conditions – in our case, those ending with "/main".</p>
<ul>
<li><code>OrderBy(commit_count)</code></li>
</ul>
<p>This sorts the objects list by the value of the field <code>commit_count</code>.</p>
<ul>
<li><code>Limit(5)</code></li>
</ul>
<p>This takes only the first five items and removes the rest from the objects list.</p>
<p>That's it! And now we end up with a valid result, which you can see below:</p>
<p><img src="https://www.freecodecamp.org/news/content/images/2023/10/gql_demo.gif" alt="Image" width="600" height="400" loading="lazy"></p>
<p>The examples below are valid and run correctly:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> <span class="hljs-number">1</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-number">1</span> + <span class="hljs-number">2</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">LEN</span>(<span class="hljs-string">"Git Query Language"</span>)
<span class="hljs-keyword">SELECT</span> <span class="hljs-string">"One"</span> <span class="hljs-keyword">IN</span> (<span class="hljs-string">"One"</span>, <span class="hljs-string">"Two"</span>, <span class="hljs-string">"Three"</span>)
<span class="hljs-keyword">SELECT</span> <span class="hljs-string">"Git Query Language"</span> <span class="hljs-keyword">LIKE</span> <span class="hljs-string">"%Query%"</span>

<span class="hljs-keyword">SELECT</span> commit_count <span class="hljs-keyword">FROM</span> branches <span class="hljs-keyword">WHERE</span> commit_count <span class="hljs-keyword">BETWEEN</span> <span class="hljs-number">0</span> .. <span class="hljs-number">10</span>

<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> refs <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">type</span> = <span class="hljs-string">"branch"</span>
<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> refs <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> <span class="hljs-keyword">type</span>

<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> commits
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">name</span>, email <span class="hljs-keyword">FROM</span> commits
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">name</span>, email <span class="hljs-keyword">FROM</span> commits <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> <span class="hljs-keyword">name</span> <span class="hljs-keyword">DESC</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">name</span>, email <span class="hljs-keyword">FROM</span> commits <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">name</span> <span class="hljs-keyword">LIKE</span> <span class="hljs-string">"%gmail%"</span> <span class="hljs-keyword">ORDER</span> <span class="hljs-keyword">BY</span> <span class="hljs-keyword">name</span>
<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> commits <span class="hljs-keyword">WHERE</span> <span class="hljs-keyword">LOWER</span>(<span class="hljs-keyword">name</span>) = <span class="hljs-string">"amrdeveloper"</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">name</span> <span class="hljs-keyword">FROM</span> commits <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">By</span> <span class="hljs-keyword">name</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">name</span> <span class="hljs-keyword">FROM</span> commits <span class="hljs-keyword">GROUP</span> <span class="hljs-keyword">By</span> <span class="hljs-keyword">name</span> <span class="hljs-keyword">having</span> <span class="hljs-keyword">name</span> = <span class="hljs-string">"AmrDeveloper"</span>

<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> branches
<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> branches <span class="hljs-keyword">WHERE</span> is_head = <span class="hljs-literal">true</span>
<span class="hljs-keyword">SELECT</span> <span class="hljs-keyword">name</span>, <span class="hljs-keyword">LEN</span>(<span class="hljs-keyword">name</span>) <span class="hljs-keyword">FROM</span> branches

<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> tags
<span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> tags <span class="hljs-keyword">OFFSET</span> <span class="hljs-number">1</span> <span class="hljs-keyword">LIMIT</span> <span class="hljs-number">1</span>
</code></pre>
<h3 id="heading-how-to-support-running-on-multiple-repositories-at-the-same-time">How to support running on multiple repositories at the same time</h3>
<p>After I published GQL, I got amazing feedback from people. I also got some feature requests, like wanting support for multiple repositories and filtering by repository path.</p>
<p>I thought this was a great idea, because I could get analysis for multiple projects and also because I could do it on multiple threads. It didn't seem like it would be very hard to implement, either.</p>
<p>So after finishing the validation step for the AST, it's time for the evaluation step but instead of evaluating it once, it will be evaluated once for each repository and then merging all results back in one list.</p>
<p>But what about supporting the ability to filter by repository path?</p>
<p>That was pretty easy. Do you remember the branches table schema? All I needed to do was introduce a new field with name <code>repository_path</code> to represent the repository local path for this branch and introduce it to other tables too.</p>
<p>So the final schema will look like this:</p>
<pre><code class="lang-sql">Branches {
   Text name,
   Number commit_count,
   Text repository_path,
}
</code></pre>
<p>Now we can run a query that uses this field:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">SELECT</span> * <span class="hljs-keyword">FROM</span> branches <span class="hljs-keyword">WHERE</span> repository_path <span class="hljs-keyword">LIKE</span> <span class="hljs-string">"%GQL"</span>
</code></pre>
<p>And that's it! 😉</p>
<h3 id="heading-thanks-for-reading">Thanks for reading!</h3>
<p>If you liked the project, you can give it a star ⭐ on <a target="_blank" href="https://github.com/AmrDeveloper/GQL">github.com/AmrDeveloper/GQL</a>.</p>
<p>You can check the website <a target="_blank" href="https://amrdeveloper.github.io/GQL/"><strong>github.io/GQL</strong></a> for how to download and use the project on different operating systems.</p>
<p>The project is not done yet – this is just the start. Everyone is welcome to join and contribute to the project and suggest ideas or report bugs.</p>
 ]]>
                </content:encoded>
            </item>
        
    </channel>
</rss>
