Super hard core intelligent IDE series-SQL editor

Super hard core intelligent IDE series-SQL editor

Tofu noodles, let's meet again! Today, our two students from ByteDance, "Chonger" and "Acinium", brought the first article of a series of intelligent IDE articles-SQL editor. Tofu fans, hurry up and enrich your knowledge. !

Authors: Chonger & "Actin" Source: Original


The IDE itself is an application that integrates many complex functions. When you want to develop an IDE, you at least need to pay attention

  1. Code editor layer (this part is called Editor layer in this article): syntax highlighting, smart prompt & completion, syntax diagnosis, document suspension, formatting...
  2. Working directory (Workspace)
  3. Extension
  4. Run the debug layer (Debug)
  5. Environment configuration (Environment)
  6. Online deployment layer (Publish), if you are making a Cloud IDE, this layer is a necessary ability, how to enable users to realize the "edit-debug-deployment" line on the web side, and ensure the debugging environment The configuration and deployment phases are the same.
  7. Version management (Version)

This article mainly introduces the content of the Editor layer in the tip of the iceberg above. Through this article, I hope to inspire some students who are doing related learning. Each process in this article will not explain the technical implementation principles behind in detail. The principles behind will be carried out in subsequent articles. Introduction. If you happen to be doing a SQL Editor, this article can be used as a good reference.

This article is applicable to:

  • You are implementing a unique Editor of your own, and you need to enable the Editor to achieve the capabilities of 1 above. I think this Editor can be an Editor in the traditional sense of input, or it can be an Editor for filling in many form items or drop-down selections, or even Yu can also be a GUI page editor. In fact, we only need to conceptually convert syntax highlighting and smart prompts.
  • You need to provide users with the ability to edit code in your application (not necessarily IDE)
  • You are using a door
    (Domain-specific language) language to simplify the development of the language, you need to highlight and prompt the unique syntax
  • Self-developed an IDE or Cloud IDE

table of Contents

  • Starting from the original Web html, interpreting how to highlight and prompt a piece of code
  • How to implement the open source editor
  • The birth of LSP
  • How to connect the open source Editor component with LSP, SQL Editor case
  • SQL Language Server
  • In summary, what needs to be done to realize a smart Editor

Start from scratch

Aside from the existing Editor components, use native html to achieve highlighting.
For example, take an example of Monaco to see how
it is implemented natively. This is a section of log content. Highlighting rules are date: green, notice: yellow, error: red, info : Gray

The key step of syntax highlighting is lexical analysis. The purpose of word segmentation is to divide the user input string into individual words (token). Token is a string of characters that cannot be further divided. The analysis process needs to scan the source code and the scanning method. There are direct scanning and regular expression scanning [1];
the function used for analysis is called
the case of the lexical analyzer . The simple and rude implementation of the regular expression is as follows, which does not have any reference significance. If you want to implement complex word segmentation, you should look for Tools like flex or ANTLR :

<!DOCTYPE html > < html lang = "en" > < head > < meta charset = "UTF-8" > < meta name = "viewport" content = "width=device-width, initial-scale=1.0" > < title > Highlight </title > < style > .custom- info { color : #808080 } .custom-error { color : #ff0000 ; font-style : bold;} .custom-notice { color :#FFA500 ;} .custom-date { color : #008800 ;} </style > </head > < body > < div id = "log-editor" > </div > < script > const tokenizer = { root : [ [ /\[error.*/ , "custom-error" ], [ /\[notice.*/ , "custom-notice" ], [ /\[info.*/ , "custom-info" ], [ /\[[a-zA-Z 0-9:]+\]/ , "custom-date" ], ] } const highlight = ( str ) => { return tokenizer.root.reduce( ( pre, current ) => { return pre.replace(current[ 0 ], ( m ) => { return `<span class=" ${current [ 1 ]} "> ${m} </span>` }); }, str); }; const log = ` [Sun Mar 7 16:02:00 2004] [notice] Apache/1.3.29 (Unix) configured - resuming normal operations [Sun Mar 7 16:02:00 2004] [info] Server built: Feb 27 2004 13:56:37 [Sun Mar 7 16:02:00 2004] [notice] Accept mutex: sysvsem (Default: sysvsem) [Sun Mar 7 16:05:49 2004] [info] [client xx.xx.xx.xx] (104)Connection reset by peer: client stopped connection before send body completed [Sun Mar 7 21:16:17 2004] [error] [client xx.xx.xx.xx] File does not exist:/home/httpd/twiki/view/Main/WebHome [Sun Mar 7 21:20:14 2004] [info] [client xx.xx.xx.xx] (104)Connection reset by peer: client stopped connection before send body completed ` const innerHtml = log.split( '\n' ).reduce( ( pre, current ) => { return pre + `<div class="line"> ${highlight(current)} </div>` ; }, '' ) window .addEventListener( 'DOMContentLoaded' , () => { const wrapper = document .querySelector( '#log-editor' ) wrapper.innerHTML = innerHtml; }) </script > </body > </html > Copy code

Rudely use a textarea pseudo-code to implement simple smart prompts.
For example, let s start with an example from Monaco.

< script > const suggestion = [ { label : '"lodash"' , documentation : "The Lodash library exported as Node.js modules." , insertText : '"lodash": "*"' , range : range }, { label : '"express"' , documentation : "Fast, unopinionated, minimalist web framework" , insertText : '"express": "*"' , range : range } ]; const getSuggestion = ( value ) => { //TODO This detailed process here is detailed in the process analysis of the SQL Language Server smart prompt at the bottom of the article const result = parser(value) return result; } window .addEventListener( 'DOMContentLoaded' , () => { const wrapper = document .querySelector( '#editor texteara' ) wrapper.addEventListener( 'change' , ( event ) => { //Calculate according to the current mouse position const position = { lineNumbers : 1 , columns}; const value =; const suggestion = getSuggestion(value , position); //Create a DOM List box to the input position }) }) </script > Copy code

How does the open source editor do

Editor supports highlighting requires two processes

  1. Parse text into symbols and scopes according to syntax
  2. Map to the corresponding color and style according to the generated scope

Editor allows you to register a language id yourself, you need to write your own rules according to the token format to finally achieve highlighting.

However, most Javascript Editors are not satisfactory in supporting smart prompts.

CodeMirror & Ace need to monitor the change event to handle

editor.on('change', changeListener); Copy code

Monaco Editor is at the forefront of this aspect, allowing you to use the register provider to register language features and handle the UI display of the return value. For users, there is no need to define the UI separately.

Register a language, details
Register smart reminder,
Register the floating document. When you are dealing with syntax parsing, if you don t use the following method, you need to use js to implement a set of language parsing

The birth of LSP

It can be seen from the above that even if the same language is used (here I use javascript), only the Editor is different. The realization of smart prompts also needs to be implemented for a separate Editor. In fact, IDEs in different languages need to be implemented for each IDE. All realize the intelligent prompt of JavaScript language.

How to provide a set of common language services for different IDEs?
For example: Javascript language server only needs to have one set to allow multiple IDEs to use, here we must recommend the VScode LSP protocol ( you can read it if you want to read it quickly) A study article written before )[2], this protocol specifies the communication between IDE and language server using the parameter format defined in the specification. The underlying interaction of the protocol is JSON-PRC (Stateless Remote Procedure Call Protocol). The communication between the Client and Server of the IDE can be socket, HTTP, or even stdio.

How the Editor interacts with LS

The following uses SQL language as a case to illustrate how the editor and SQL Language Server interact. Here I have established a Web Socket connection between the Client and Server.

  1. Initialization: Client will send to Server before Editor is opened
    Initialization message, the params.capabilities in the message specifies the capabilities supported by the client, such as completion

At this time, after the server receives the initialization request, it needs to send the current language support capabilities, such as language support documentFormattingProvider (formatting), hoverProvider (document suspension), definitionProvider (jump definition), completionProvider (completion), codeActionProvider;
if Language doesn t support formatting, it s not there

If documentFormattingProvider is returned, the client will not display the formatting menu.

  1. Open event: Client will send to Server after Editor is opened
    The message, the body of the message is as follows, the current language, source code, and uri will be marked (it can be a file address or a virtual address, depending on the implementation of the server)

  1. change event: When the user enters the code, the Client will send it to the Server
    Message, the server decides whether to process the message, similar to the action of open, in this case, the server will diagnose a syntax error during the input process, and the response and open return are the same

  1. Server can also actively push events to Client. My case here is that the server will actively send diagnostics events and send the results of syntax diagnosis after opening or changing. The content returned by diagnosis is the location of the wrong text and the error message, as shown in the following range Is the start and end position, message is the message content

  1. Completion event: Client will also send to Server during the input process

After the server receives the message, it sends the content that needs to be completed, and the server does a series of internal analysis and then gives the content that needs to be completed,
such as for user input

select * from a
Server needs to complete the library name, when the user enters
select * from aaa.
When you need to complete the table below the aaa library

Here you can see that there is an id field in the content of the server response. The id is the id sent by the client. The server uses this to mark which event to respond to, and the client will process the corresponding request event based on this. The reason is that some behaviors will take a short time If triggered multiple times, the Client can cancel an event individually, and there will be no id in the request body and the response body, and the event type will be determined by the method.

  1. Hover document: Client will send to Server when mouse hovering word
    Event, Server calculates the position of the current word in the abstract syntax tree according to the current mouse position sent by the Client, and returns the corresponding document

Language Server Smart Tips

What Language Server needs to do is to implement a subset of the functions defined by the LSP. Take the most core smart reminder as an example. There are two steps to what needs to be done.

  • The first step is when you are interacting with the Editor. At this time, the editor is the process of changing the content. The server needs to maintain the code "file" that is changing so that it can be used when smart prompts are needed. In the implementation here, if LS and Editor are on the same computer, the file system can be used arbitrarily; if they are separated, they need to be updated according to the uri and content in the change event and refreshed to the storage of LS; according to the statement of LS capacity, each change event can deliver full or incremental content.
  • In the second step, when the Editor realizes that a smart prompt is needed here (LS will declare a triggerCharacter so that the Editor knows after which characters the smart prompt is needed), it will send a completion event to LS, which contains the current cursor position (such as the one provided by VScode) The position is the lineNumbers row, and the column column starts from 1). From this position and the content of the code stored in the first step, LS will perform a series of grammatical analysis, return all the content that can be prompted, and show it to the user, just like the content box of the drop-down list you see in the GIF picture above.

The most critical point in this process is the second step, how to give a series of intelligent prompts based on a piece of code and one of its positions. Of course, many languages have ready-made auto-completion wheels, such as Python's jedi. Here is SQL as an example: in simple terms, we need to perform lexical analysis and syntax analysis on a string of SQL to understand what code can be written next. The lexical analysis and grammatical analysis here are actually the first half of the "front end" of the compiler in the compilation principle: lexical analysis is to divide the code into words (Token), and grammatical analysis is to perform a series of defined calculations on the Token sequence , To build a specific data structure. The product obtained by the general compiler after syntactic analysis is an abstract syntax tree (AST), and based on this, semantic analysis and optimization are continued. A standard SQL AST tree has the following structure:

But to implement a smart reminder, AST alone is not enough. 1. we need to be able to support parsing the SQL code being edited, and second, we need to convert the result of parsing the SQL into a smart prompt result. In other words, we need to define the grammatical rules in detail to the time of editing, and define the behavior of grammatical analysis to make the product carry more useful information for completion. For example, we use

Represents the cursor, and has the following SQL waiting to be completed

The SELECT | the FROM some_table; Copy the code

We know that normally it needs to be completed

The fields under the table, of course, may also be functions, or
. So after parsing the above SQL (note that it is parsed with the cursor here), we want a data structure like this

{ "AST" : {...}, "keywords" : [ "*" , "DISTINCT" ], "columns" : true , "functions" : true , "source" : { "table" : "some_table" } } Copy code

In this way, we can get the list of our prompts through the attributes. The specific operations are as follows

  1. keywords
    All the content in the list enters the list of prompts
  2. functions
    Field is
    , We stuff all the list of known functions into the list of hints
  3. columns
    Field is
    , Combined
    Field knows we need to pull
    All the fields of the table and put them in the list of prompts

Of course, this is just an example, you can add content in the analysis results as needed, typically with prompt priority, etc. And how to turn these rules into a usable lexical + syntax analyzer, in fact, due to the mature development of the front end of the compiler, there are many tools (parser generator) that can accomplish this task, without the need for us to face the rules. Handwritten code logic, such as antlr, bison/yacc & lex, etc.

About this part, it is recommended to read the reference document [1]


To realize the intelligence of a set of languages, you need to implement a Language Server at the Server layer. This Server can be written in any programming language. vscode provides a package that conforms to the LSP specification for developers to use

If you are providing a language service for js developers, you can refer to

Editor layer, if you are using Monaco Editor, you can

[5] To transform the language ability you want on the basis of [5]; if you are using CodeMirror or Ace, you can refer to it

Reference documents

[1] Lexical analysis
[2] LSP protocol
[3] vscode-languageserver
[4] typescript-language-server
[5] monaco-languageclient
[6] lsp-editor-adapter