Oil 0.8.pre5 - Progress in C++

This is the latest version of Oil, a Unix shell:

Oil version 0.8.pre5 - Source tarballs and documentation.

To build and run it, follow the instructions inINSTALL.txt. If you're new to the project, see Why Create a New Shell? and the2019 FAQ.

Table of Contents

Semi-Automatic Translation to C++

Two Analogies: Go Compiler and TeX

DSLs and Code Generation

Wrapping Shell Dependencies

Appendix: Selected Metrics

Highlights

  • As of this release, we runspec tests against theoil-native binary! In other words, we're measuring how well the semi-automatic translation to C++ works.
    • Here arethe results. The Python version of OSH passes 1560 tests (+), while the C++ version passes 420 tests. This is significant progress, but there's more to do, which I discuss below.
  • Koichi Murase made over a dozen fixes to OSH, motivated by runningble.sh ( full changelog ).
  • I made a few fixes to run the ShellSpec project . Notably, shopt -s extglob is now respected.
  • Internal: we have proper C++ unit tests and run them onour continuous build. I started using the greatest.h test framework, and it's simple and effective ( Zulip thread ).

I'd still like more bug reports! See How To Test OSH .

(+) Test harness bug that will be fixed: 1539 should be 1560.

Closed Issues

#758 Incorrect fnmatch due to extended glob syntax
#754 Implement test -u and test -g
#753 ${var+foo} shouldn't cause error when 'set -o nounset'
#727 1 ? (a=42) : b shouldn't require parentheses

Semi-Automatic Translation to C++

Two Analogies: Go Compiler and TeX

What's all this about C++? Here are two analogies to help explain what's going on.

  1. GopherCon 2014: Go from C to Go by Russ Cox (YouTube, 31 minutes). It's time for the Go compilers to be written in Go, not in C. I'll talk about the unusual process the Go team has adopted to make that happen: mechanical conversion of the existing C compilers into idiomatic Go code. ( Grind is the one-off tool that helped with translation, analogous tomycpp.)

    The flavor of the work is similar to what I'm doing with Oil, but there's a key difference: Oil's source will remain in statically typed Python and DSLs likeZephyr ASDL for the forseeable future. We won't be writing C++ by hand.

    Static types play an important role in both translations.

  2. How to compile the source code of TeX . Knuth wrote TeX in a dialect of Pascal, but it's not compiled with a Pascal compiler. Instead, it's translated to C and compiled with a C compiler.

The common thread is that we want to preserve the correctness of an existing codebase. Oil runsthousands of lines of existingbash scripts, including some of the biggest shell programs in the world .

Rewriting by hand would introduce a lot of bugs, so instead we write a custom translator and apply it to the codebase. In Oil's case, there are more code generators to remove dynamic typing and reflection, discussed below.

Recap

In addition to the new spec test metrics, these line counts give a feel for recent progress:

  • The0.7.pre9 release in December.
    • osh_parse.cc has 9,867 lines of code (raw data). I showed thatthe OSH parser can be gradually refactored and translated to C++. Notably, the result is as fast as hand-written C code .
  • The0.8.pre2 release in March.
    • osh_eval.cc has 16,491 lines of code. In addition to the parser, we translate the word and arithmetic evaluators.
  • This release,0.8.pre5.
    • osh_eval.cc has 20,875 lines of code. We translate the command evaluator, including assignments. So the resulting C++ interpreter can run code like readonly x=y; echo $x . Details below.

For comparison, the slow OSH interpreter consists of about 30K lines of Python code. This doesn't include theOil language, which I haven't started translating.

The translation isn't going as quickly as I'd like it to, but it's working, and I'm solving interesting technical problems along the way.

As far as I can tell, this unusual process is the shortest path to a fast shell. (As mentioned in January, I encourage parallel efforts . Feel free to ask me about this.)

Details

I keep a log of the translation process onZulip.

  • Static typing of flag parsing was a big deal ( Zulip thread ). A common theme of translation is turning Python reflection into textual code generation, and this was another instance of it.
    • Assignment builtins like declare -g foo=bar now work, so we have a path to translate moreshell builtins to C++.
  • Zephyr ASDL is turning into half of a programming language ( Zulip thread ). Specifically, it's a language for describing typed data , which Python is missing. It now supports dicts/maps with the syntax map[string, int] .
  • The interpreter is still "pure" , which is why only 420 tests pass. The nascent osh_eval.cc doesn't even run ls , because it's external process! But it understands the hairy details of word evaluation ${} , arithmetic evaluation $(( )) , brace expansion {a,b} , and more.

More background: the March recap had a similar section with Zulip threads: mycpp: The Good, the Bad, and the Ugly .

TODO on Translation

Even though about two-thirds of OSH translates to C++ and compiles, and much of it runs correctly, there's still a lot of work left.

Oil is simply a big project: recall thatbash consists of over 140K lines of code . I estimate that OSH implements 80% ofbash, with significant fixes. And Oil is a new language with many features on top.

DSLs and Code Generation

Oil's source code will remain in high-level languages for the forseeable future, so we need to enhance the code generators to produce correct and fast C++.

  • mycpp
    • The OSH interpreter uses Python's try / finally for scoped destruction, but C++ doesn't have finally . We should probably use Python's context managers, and havemycpp translate such blocks into constructors and destructors.
  • Zephyr ASDL
    • The translation process deals with exceptions in a messy way, using something approximating #ifdef . Exceptions are more like structs than classes, so they could logically expressed with ASDL schemas.
  • Thepgen2 parser generator
    • The syntax of theOil language is expressed withpgen2, and we don't have a C++ code generator for it yet. After discussion with Jason Miller, I think we should borrow the original code generator and runtime fromCPython rather than try to translate the slow Python implementation.

Wrapping Shell Dependencies

In theJanuary blog roadmap, I mentioned that there are two technical problems with translation.

One of them was wrapping native C code, which I no longer see as a risk. It's just work. The shell has three main dependencies:

  1. libc. I've wrapped pure functions like fnmatch() in C++, and this is straightforward.
  2. The Unix kernel. Wrapping functions like execve() is similar to wrappinglibc, but errno handling is an issue I want to revisit. (These Unix comics are relevant.)
  3. GNUreadline for interactive features. To be honest, I'd rather punt interactive features to Oil code, analogous toble.sh. But Oil should have basic readline support.

Open Problems

  • The interpreter's memory management is probably the biggest open issue. I have ideas, but I haven't tested them with an implementation.
  • Theautocompletion code makes good use of Python's yield , which I can't (or don't want to) use in C++. I might rewrite it with fork() and write() to a pipe.

Plan for 2020

Asmentioned in January, the bare minimum for "success" is when OSH to replacesbash for my own use.

After reviewing all this work, I still feel like OSH can be "finished" in 2020. I won't be extremely surprised if isn't, but it seems reasonable.

On the other hand, it seems clear that the Oil language will remain a prototype for all of 2020. I haven't gotten much feedback on it, probably because there isn't much documentation.

This is disappointing, but I don't have a solution to this problem.

In short, the project's focus has necessarily narrowed . The only two goals on my radar are:

  1. The OSH language should be translated to C++, tested, and optimized.
  2. The Oil language should be divorced from the Python runtime and similarly translated. This will almost certainly bleed into 2021.

I should write a longer blog post about this, but almost everything else is cut . Oil will be more like a library than a shell. (As mentioned, I'll need basic GNUreadline support for my own use.)

The docs are another sore point. I've mostly been writing them "on demand" (whenever anyone asks). It seems like that pattern will continue, given all the other work that needs to be done.

What's Next?

  • Continue translating Oil to C++, guided by metrics.
    • Increase the number of spec tests passing from 430 , shown in spec.wwz/cpp/osh-summary.html .
    • Increase the number of lines of code translating and compiling from 20,875 .
  • Fix bugs reported by users. Bug reports really help! Again, see How to Test OSH .
  • Improve the OSH interpreter, especially with regard to errexit ( issue 709 ). I'd also like to resume work on Running ble.sh With Oil .

Feel free to ask questions in the comments or onZulip!

Appendix: Selected Metrics

Let's compare this release with the previous one, version0.8.pre4.

Native Code Metrics

We have nearly 70K lines of C++ code, including over 20K translated bymycpp.

The size of the osh_eval.opt.stripped executable differs between GCC and Clang, an I don't yet know why. In any case, the increase is consistent with translating and compiling more lines of code.

Test Results

OSH spec tests:

There was no work on the Oil language! I'm a bit concerned by that, which is one reason for the scope reduction mentioned above.

Line Counts

We have ~300 new significant lines of code in OSH:

And ~500 new physical lines of code:

Benchmarks

The parsing benchmark didn't change much:

Nor did the runtime benchmark:

我来评几句
登录后评论

已发表评论数()

相关站点

+订阅
热门文章