Go in Go

Go in Go

As of the 1.5 release of Go, the entire system is now written in Go.(And a little assembler.)

C is gone.

Side note: gccgo is still going strong.

This talk is about the original compiler, gc .

Why was it in C?

Bootstrapping.

(Also Go was not intended primarily as a compiler implementation language.)

Why move the compiler to Go?

Not for validation; we have more pragmatic motives:

  • Go is easier to write (correctly) than C.
  • Go is easier to debug than C (even absent a debugger).
  • Go is the only language you'd need to know; encourages contributions.
  • Go has better modularity, tooling, testing, profiling, ...
  • Go makes parallel execution trivial.

Already seeing benefits, and it's early yet.

Design document: golang.org/s/go13compiler

Why move the runtime to Go?

We had our own C compiler just to compile the runtime.We needed a compiler with the same ABI as Go, such as segmented stacks.

Switching it to Go means we can get rid of the C compiler.That's more important than converting the compiler to Go.

(All the reasons for moving the compiler apply to the runtime as well.)

Now only one language in the runtime; easier integration, stack management, etc.

As always, simplicity is the overriding consideration.

History

Why do we have our own tool chain at all?

Our own ABI?

Our own file formats?

History, familiarity, and ease of moving forward. And speed.

Many of Go's big changes would be much harder with GCC or LLVM.

news.ycombinator.com/item?id=8817990

Big changes

All made easier by owning the tools and/or moving to Go:

  • linker rearchitecture
  • new garbage collector
  • stack maps
  • contiguous stacks
  • write barriers

The last three are all but impossible in C:

  • C is not type safe; don't always know what's a pointer
  • aliasing of stack slots caused by optimization

( Gccgo will have segmented stacks and imprecise (stack) collection for a while yet.)

Goroutine stacks

  • Until 1.2: Stacks were segmented.
  • 1.3: Stacks were contiguous unless executing C code (runtime).
  • 1.4: Stacks made contiguous by restricting C to system stack.
  • 1.5: Stacks made contiguous by eliminating C.

These were each huge steps, made quickly (led by khr@ ).

Converting the runtime

Mostly done by hand with machine assistance.

Challenge to implement the runtime in a safe language.

Some use of unsafe to deal with pointers as raw bits in the GC, for instance.

But less than you might think.

The translator (next sections) helped for some of the translation.

Converting the compiler

Why translate it, not write it from scratch? Correctness, testing.

Steps:

  • Write a custom translator from C to Go.
  • Run the translator, iterate until success.
  • Measure success by bit-identical output.
  • Clean up the code by hand and by machine.
  • Turn it from C-in-Go to idiomatic Go (still happening).

Translator

First output was C line-by-line translated to (bad!) Go.

Tool to do this written by rsc@ (talked about at GopherCon 2014).

Custom written for this job, not a general C-to-Go translator.

Steps:

  • Parse C code using new simple C parser ( yacc )
  • Remove or rewrite C-isms such as *p++ as an expression
  • Walk the C parse tree, print the C code in Go syntax
  • Compile the output
  • Run, compare generated code
  • Repeat

The Yacc grammar was translated by sam-powered hands.

Translator configuration

Aided by hand-written rewrite rules, such as:

  • this field is a bool
  • this function returns a bool

Also diff-like rewrites for things such as using the standard library:

diff {
-    g.Rpo = obj.Calloc(g.Num*sizeof(g.Rpo[0]), 1).([]*Flow)
-    idom = obj.Calloc(g.Num*sizeof(idom[0]), 1).([]int32)
-    if g.Rpo == nil || idom == nil {
-        Fatal("out of memory")
-    }
+    g.Rpo = make([]*Flow, g.Num)
+    idom = make([]int32, g.Num)
}

Another example

This one due to semantic difference between the languages.

diff {
-    if nreg == 64 {
-        mask = ^0 // can't rely on C to shift by 64
-    } else {
-        mask = (1 << uint(nreg)) - 1
-    }
+    mask = (1 << uint(nreg)) - 1
}

Grind

Once in Go, new tool grind deployed (by rsc@ ):

  • parses Go, type checks
  • records a list of edits to perform: "insert this text at this position"
  • at end, applies edits to source (hard to edit AST).

Changes guided by profiling and other analysis:

  • removes dead code
  • removes gotos
  • removes unused labels, needless indirections, etc.
  • moves var declarations nearer to first use

rsc.io/grind

Performance problems

Output from translator was poor Go, and ran about 10X slower.Most of that slowdown has been recovered.

Problems with C to Go:

  • C patterns can be poor Go; e.g.: complex for loops
  • C stack variables never escape; Go compiler isn't as sure
  • interfaces such as fmt.Stringer vs. C's varargs
  • no unions in Go, so use structs instead: bloat
  • variable declarations in wrong place

C compiler didn't free much memory, but Go has a GC.Adds CPU and memory overhead.

Performance fixes

Profile! (Never done before!)

  • move vars closer to first use
  • split vars into multiple
  • replace code in the compiler with code in the library: e.g. math/big
  • use interface or other tricks to combine struct fields
  • better escape analysis ( drchase@ ).
  • hand tuning code and data layout

Use tools like grind , gofmt -r and eg for much of this.

Removing interface argument from a debugging print library got 15% overall!

More remains to be done.

Technical benefits

Other benefits of the conversion:

Garbage collection means no more worry about introducing a dangling pointer.

Chance to clean up the back ends.

Unified 386 and amd64 architectures throughout the tool chain.

New architectures are easier to add.

Unified the tools: now one compiler, one assembler, one linker.

Compiler

GOOS=YYY GOARCH=XXX go tool compile

One compiler; no more 6g , 8g etc.

About 50K lines of portable code.

Even the registerizer is portable now; architectures well characterized.

Non-portable: Peepholing, details like registers bound to instructions.

Typically around 10% of the portable LOC.

Assembler

GOOS=YYY GOARCH=XXX go tool asm

New assembler, all in Go, written from scratch by r@ .

Clean, idiomatic Go code.

Less than 4000 lines, <10% machine-dependent.

Almost completely compatible with previous yacc and C assemblers.

How is this possible?

  • shared syntax originating in the Plan 9 assemblers
  • unified back-end logic (old liblink , now internal/obj )

Linker

GOOS=YYY GOARCH=XXX go tool link

Mostly hand- and machine- translated from C code.

New library, internal/obj , part of original linker, captures details about machines, writes object files.

27000 lines summed across 4 architectures, mostly tables (plus some ugliness).

  • arm : 4000
  • arm64 : 6000
  • ppc64 : 5000
  • x86 : 7500 ( 386 and amd64 )

Example benefit: one print routine to print any instruction for any architecture.

Bootstrap

With no C compiler, bootstrapping requires a Go compiler.

Therefore need to build or download a working Go installation to build 1.5 from source.

We use Go 1.4+ as the base to build the 1.5+ tool chain. (Newer is OK too.)

Details: golang.org/s/go15bootstrap

Future

Much work still to do, but 1.5 is mostly set.

Future work:

Better escape analysis.

New compiler back end using SSA (much easier in Go than C).

Will allow much more optimization.

Generate machine descriptions from PDFs (or maybe XML).

Will have a purely machine-generated instruction definition:

"Read in PDF, write out an assembler configuration".

Already deployed for the disassemblers.

Conclusions

Getting rid of C was a huge advance for the project.Code is cleaner, testable, profilable, easier to work on.

New unified tool chain reduces code size, increases maintainability.

Flexible tool chain, portability still paramount.

我来评几句
登录后评论

已发表评论数()

相关站点

+订阅
热门文章