-
-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scala DSL #11
Comments
Hi Breandan, haven't heard from you & Thorsten for a while, what are you guys scheming on lately? |
Hey @tribbloid, looks cool! I won't have time to review this for a couple weeks, but encourage you to keep going! Who do you see as the audience for the Scala DSL? Is this just for Scala users, or would it be possible to consume the Scala DSL from Kotlin/Java, and how would that look? It would be helpful to have some usage examples with different algorithms for reference, e.g. matrix multiplication, linear regression, MLP, RNNs. I know you're a Scala fan, so I won't try to convert you, but have you looked into arrow-kt? I know it has a lot of Scala-like functionality. BTW I recently looked into using TVM4J, but it looks like they're sorting out some stuff, so it might be good to circle back when they decide to publish a JVM library. Seems like TF4J is also making some progress, so it will probably be between the two. Haven't seen @tscholak in a while, but I saw he's doing a GSoC with hasktorch. It would be great to catch up at some point after all the dust settles down. How are things on your end? Hopefully things return to normal again soon! |
For arrow-kitScala DSL should be for scala users, kotlin follows a different design philosophy. But I'm tempted, so I wrote some random showcase in both scala & kotlin:
val x3 = DoubleVector(1.0, 2.0, 3.0)
val x50 = DoubleVector(
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 0.0, //
)
val y3 = DoubleVector.random(3)
val y6 = DoubleVector.random(6)
val y50 = DoubleVector.random(50)
val y53 = DoubleVector.random(53)
println(x3 dot_* y3)
// x3 dot_* y4 doesn't compile
println((y3 concat y50) dot_* y53)
println((x50 concat y3) dot_* y53)
// (x50 concat y3) dot_* y6 doesn't compile
val x3 = DoubleVector(1.0, 2.0, 3.0)
// throws 'Too many arguments for public constructor'
val y3 = DoubleVector.random(3)
// throws 'Does not conform to expected type' Oops, doesn't work. I guess scala use too much syntax sugar for compiling. I didn't have a lot of experience with kotlin, but I can see that arrow is very ambitious, almost equivalent to rewriting the compiler. Not doubt it will make many things easier (e.g. getting rid of that code generator). However, I still didn't see the ability to summon axioms & proofs at type level, which is important once tensor operation start to get hairy (e.g. the |
For TVM4JWe don't need to wait for TVM4J release/publishing, it is at the opposite end of our layer of abstraction. We could only think about AutoDiff, shape safety & forward/backward rule generation, while let the boys at DMLC to worry about model compilation and deployment, their tools will definitely keep up with latest TVM release. Then I just realised that TVM just moved out from DMLC lately, so I may need to study their politics before making a prediction :-< |
I believe the Arrow folks are adding support for type proofs, but it's still early in development. I had a chat with @pakoito about type level integers a while ago, not sure if this is a prototype for arrow-meta proofs or just a design discussion. It would be interesting to see if were possible to push this direction a bit further, maybe it is possible to do some form of type-level arithmetic for concatenation and slicing. Maybe you could even implement shape-safe convolution operators. Tempted to try.
I see, so you want to consume Kotlin∇ from Scala. I guess it makes sense. I know there is an existing shape-safe tensor library for Scala called nexus, it might be interesting to compare what @ctongfei was doing with your planned DSL to see if there is anything that could be borrowed or improved. Right now, our AD implementation is pretty coupled with the shape-checking stuff, but it should be possible to provide a shapeless AD implementation, as you mention it would probably be a lot easier to implement the Kotlin∇ shape system in Scala anyway. |
On nexusNo they don't, I've used it for a long time and it's in his TODO list forever:
I probably should remind him that 2.13 is not required |
Interesting, I was not aware of this! Do you think singleton types are capable of supporting convolutional arithmetic? I have seen implementations of addition and subtraction using Presburger arithmetic. If you were able to persuade Scala/Shapeless into performing multiplication and division at the type level, that would be pretty impressive/paperworthy. Maybe you could start out with something simple like unit strides which just requires doubling, addition and subtraction: Then maybe work your way up to dialated convolution which requires multiplication, division and floor: |
Yep, easy. See my example at: https://github.com/tribbloid/shapesafe/blob/49a1c5b29d6d1ec619b2decd38c4b716cdeef713/demo/src/main/scala/edu/umontreal/kotlingrad/shapesafe/demo/DoubleVectorDemo.scala#L35-L35
OK it’s not easy at all, I spent too much time trying to get implicit view pattern & singleton summoner pattern to work, until the legendary @DmytroMitin pointed out that type class is still the best option. The experiment code had some drastic changes to allow short definition of complex arithmetic, the one related to convolution is right here: |
just found his project: https://summerofcode.withgoogle.com/projects/#5862505638789120 Nice work! |
Very nice! I've been playing around with this a bit and just had a few comments/questions:
|
at your service. These are my answers: Vectors do not (yet?) compose.The POC had only implemented DoubleVector, Not matrix or tensor. They are not difficult (ideally by stealing code from dotty tuple: scala/scala3#2199), just need a lot of code. Also, I'm using good old breeze as data representation but its a bit obsolete, it should move to a representation that supports at least big data (Apache Arrow, DJL), symbols and dual numbers (your library), or both. |
How do you define a function with a fixed arity?easy:
I'm kind of bothered by the church encoded |
How do you write a function where one argument is fixed and the other is genericYou just need to summon proofOfArity for 1 of them, something like:
|
Minor issue, but the Scala plugin for IntelliJ IDEA does not type check compound expressions and must wait for compilationEven worse, sometimes it gives false errors for type check. I'm always kind of jealous of F# programmers on Windows, they IDE never diverge from the compiler. But it is bad politics, Jetbrains thinks scalac:
So it probably doesn't worth their time. The problem is well know in scala community, and I heard that dotty PM has ditched their grad-student descent based methodology |
I also have a few questions for you:
https://enoki.readthedocs.io/en/master/autodiff.html#graph-simplification For symbolic diff this could be tested by:
The reason I'm asking is to see the difficulty of writing in JVM that implements/extends operands and graph nodes of a low level autodiff IR (e.g. RelayIR in TVM2). I was initially against it as I thought it embodies too much work, but on second thought I believe you made the right decision: In prototyping stage people may encounter all kinds of esoteric functions, and if they can't insert it into the graph without some serious hacking, they may be forced to ditch the effort. |
I have another reason (though not 100% sure): I believe that if you want to implement it from scratch, then the mixed-mode / hybrid should always be the default graph crawling algorithm (similar to forward-backward algorithm for HMM). I knew status quo frameworks are already satisfied with reverse mode, but they are sub-optimal on hourglass-shaped architecture (autoencoder, GAN, Full-conv, UNet, HydraNet), and they are getting more pervasive. Also, naive & backward mode should be special cases of mixed mode. So maybe staying ahead of the curve is one way to to avoid the dilemma later The mixed mode should depend on the above mentioned graph simplification, I don't have an exact idea at the moment but I vaguely feel that it involves finding the smallest subset of the graph that can reach all the inputs/outputs. Probably it is not the most important thing on our level and should be delegated to DMLC guys |
It depends on the representation you're using. In general, common subexpression elimination is NP-hard. If the graph is a DAG, the problem is GI-hard, or if the graph is a tree, it's something like (Matula, 1978), or maybe a little easier. It's a well-studied problem in the compiler literature, and there are lots of good resources and efficient implementations around. Due to frequent nonlinearities in typical deep learning pipelines, there are not often many algebraic simplifications you can do, but it could be a useful optimization for differentiable programming more broadly. Kotlin∇ uses some ad-hoc simplifications, but it's probably not something we should be rolling ourselves. I'm still looking for a good solution, there are some good resources on computer algebra in the readme.
There is a toy symbolic expression generator in our tests. I need to write some more extensive test cases using property-based testing. Chapter 4 of my master's thesis describes the theory behind PBT and metamorphic testing, which were experimentally validated, but have not yet been integrated into our CI test suite. Our high-level approach is to generate a large number of random trees and run a sensitivity analysis across numerical inputs. We need to set up some regression tests to check for runtime performance and numerical stability, but the idea is that you specify an invariant, and throw an error if you can find a simplification and inputs which violate it. If you're interested, there are some mature testing frameworks which automate this process, maybe check out ScalaTest if you're unfamiliar.
This is another active area of research known as Tensor contraction ordering or optimal Jacobian accumulation, which in general is NP-hard (Naumann, 2006). If you're just doing inference, there is a matrix chain multiplication algorithm (Hu & Shing). As you mention, the naïve implementation in most AD packages is not very efficient. For example, JAX uses reverse mode by default. You can also accumulate the Jacobians using a custom order (e.g. |
After carefully studying your answers I believe I have enough information to figure out a real fitness in the market, and that niche is 2-scale autodiff OR megakernel: This is a well-proven trick proposed by MIT CSAIL soft robot team: https://news.mit.edu/2019/model-design-control-robots-target-1122 for large scale physical simulation (Good to know that you are working for tooling of robotics, can make further discussion much shorter) In essense, the author argues that:
(Quote from ICLR 2020 preprint: https://arxiv.org/abs/1910.00935 [1]) The capability is missing from most DL tools (all if shape safety is a concern) for a good reason, tracing (the conventional sense of reverse autograd, referred as "Tape" in torch & TF) is universally favoured for its flexibility (as they preferred, the 'define-by-run' feature), even for complex loops & control flows. This is where C++ crunch raw numbers directly after IR => bytecode compilation, and C++ / CUDA beats JVM fair & square for pure computation power, we don't have a fair game here. Plus, the mixed-mode autograd is in this cluster of mess and we may want to stay away. SCT ("rewriter" based) on the other hand, uses most techniques you are familiar with: AST simplification, symdiff, and consumes little time on JVM. Its weakness for complex control flow was bypassed in the megakernel design, in which only ASTs of simple, monoidal execution blocks are rewritten & compressed (such that the tape skipped most intermediate values), and parallelised as accumulators. Obviously this is not the only way for symdiff and autograd to work together, but it is a good start. Unfortunately, I'm yet to find a JVM based deep learning framework that allows me to plugin a complex enough megakernel into its tape (see my question at deepjavalibrary/djl#140, the first author of [1] and the second committer of DJL are both active on social media, so feel free to ask them directly). This is the only loose end of my hypothesis. If every details of my hypothesis can be proven it means I'm talking about an industrial grade autograd framework that: - unifies physical simulation & machine learning - leverages a mature ecosystem on JVM - has shape safety Do you see any loopholes in my reasoning? If not let's wait for a few days for the answer. P.S. have you graduated? I know you won't be with Google, so what's your plan to find serious users of your work? |
I've been studying DiffTaichi pretty closely over the last few months, and agree it's a good design choice. The tools they've built for physical simulation are also very impressive. My colleagues and I have been working on some similar ideas connecting differentiable physics and rendering that we hope to be able to share soon. Learning smoothing tricks to get stable gradients for collisions and discrete events has been really challenging and I think a good litmus test for where differentiable programming is headed in the future. I'm personally interested in making those techniques accessible to the broader field of software engineering.
This is probably more related to #8, but I am staying in school to pursue further research. Last year, I moved from UdeM to McGill, where I am now working to build a connection between automatic differentiation and learning on programs. I think the AD/ML community is doing important translational research, but in terms of this project, I feel the path to adoption requires taking a longer view on automatic differentiation than other libraries can afford. I think we're in a good position to do so, and have identified three high level goals for Kotlin∇: (1) language design, (2) symbolic reasoning and (3) graph computation. One of our goals for Kotlin∇ is to provide a staged eDSL that resembles an imperative program (containing variables, control flow, functions and data types) for mathematical code. The language should look and feel similar to a scripting language like Python with a flexible type system and mathematically idiomatic syntax. I think it is important to actually use the DSL, so I will try to get to the point where it's fast enough for myself and others to use for training. Another project I recommend checking out is KMath, which has shared a number of inspirations and plans to support a much larger API surface (if that's important to you). Another goal for Kotlin∇ is providing tools for AD/compiler research: solvers for equational reasoning, term rewriting and symbolic computation. Although normalization and canonicalization are undecidable in general, if we impose certain constraints, it becomes "just" NP-hard, using Knuth-Bendix or similar completion algorithms. It would be nice to provide a tool for determining if two symbolic expressions are semantically equivalent in a certain axiom system, by using logic programming (e.g. miniKanren, Prolog SAT/SMT solvers) or learning techniques (e.g. Siamese GNNs, symbolic pregression) for expression simplification. Finally, our goal is to compile the entire graph (including control flow, data types, logic, etc.) to sparse matrices, where "executing" the program consists of pure matrix arithmetic. I think one day there will be a VM or interpreter for a high-level language that runs entirely on a matrix processor, only exiting to perform I/O. Users will be able to run existing programs and get parallelization for free. That's going to be a long term project, but there is some progress we can make today, e.g. leveraging GPU/TPU/IPU intrinsics using GraphBLAS. Graphs and matrix representaton is of the things I've been working on this summer.
I think a hybrid tracing and SCT-based design makes sense and is a good compromise for the JVM. Another approach from Wang et al. (2018) proposes multistage-programming, which seems closer in spirit to what we are currently doing. Have you looked into Lantern and the LMS framework? I think it's a good architecture and also makes a lot of sense from a compiler perspective. What do you think are the advantages and disadvantages of these two approaches? Happy to discuss how to integrate with DJL or a similar backend. We don't want to reinvent the wheel, and I think that exercise would be helpful in the context of (1). |
Thanks a lot for the ideas you brought up. If you are in the college, what's the earliest time you could plan for a co-op program? Would you like to apply for GSoC next year or some part-time job at a heavyweight JVM industry user? (e.g. awslab) No offense but I think comparing to hasktorch, a shape-safe JVM DSL alone could be much more relatable to these companies. Sorry for my slow response as I don't know anything of compiler, it took me a few months to almost figure out the ecosystem of JVM-based staging. I remain uneducated to the state-of-the-art research, but can at least narrow down to 1 or 2 directions. But there is one silver lining: my mind will no longer be contaminated by compiler front-end (like parsers, lexers, AST generation or anything suitable only for human), a bless that would be impossible if I start reading a few years earlier The LMS is a brilliant idea - I wish I could say the same about execution but it's too early to tell. The old LMS project in the 2014 paper got my attention twice - for 2 different prototypes, and in both cases it looks like a strong contender:
The way I read it, the LMS stands out for eliminating the front-end, human-interacting part completely (aka the 'lightweight' in its name), making it equally transparent to human and machine agent alike, if your research outcome can do the same on kotlin WITHIN ITS language specification that will be awsome. But everything else should ideally be decoupled: the multi-staged, cross-platform code generation is important and should be in good working order, but they should be gradually replaced by more optimised components in a very long roadmap, instead of becoming a bilateral problem of go big or go home. This is why I said I'm not a big fan of its execution: It has 2 hard forks, which is bad enough. But the worse thing is: both of them may overstretch, and risk evolving into over-engineered clusters of inscrutable, esoteric elitist magic that no one wants to touch. These hard forks are:
TL;DR: I saw LMS and difftaichi solving problems on 2 different levels, and their ideas can be made integratable and work together. Unfortunately, I'd rather wait and attempt on their industry adoption later, after all the dust have settled and only 1 fork remained prevail. In addition, there is a looming threat that SIMD compiler becoming a winner-take-all market (of which MXNet is almost out), so one way or the other I encourage you to push your ideas and works early, instead of waiting for things to become mature |
BTW, Is part of the GSoC 2020 work integrated into this new project? https://github.com/google-research/dex-lang Looks like a lot of overlapping |
Hi @tribbloid, thank you for your interesting reply! This gives me a lot to think about. As you mentioned, there are some great projects in differentiable programming, including Myia, JAX, DEX, Hasktorch, TVM, MXNet, Software 2.0@FB, TF4j, KotlinDL and Swift in no particular order. While I share your enthusiasm and have tremendous respect for the people undertaking these projects, I have grown skeptical that differentiable programming is the panacea it claims to be. Nevertheless, I think DP has a lot of mileage left and am glad we will have no shortage of good DSLs from which to choose. I am currently in the middle of another project, and have been unable to devote as much time to Kotlin∇ as I would like. However, I will continue to support it and envision our current line of research may one day bear fruit for Kotlin∇, if successful. One of my passions and the reason I started this project in the first place was types. Like you, I was working on SQL code generation some years ago and fell in love with type-safe DSLs. Working on this project, I rediscovered that creative spark. I'm pleased to report that many of the features which Kotlin∇ introduced have been absorbed into KMath due to @CommanderTvis and @altavir. I would like to thank them for their efforts and strongly encourage users who need a stable API to check out their work.
Agreed, we need an IR which is interpretable to both consumers. λ-expressions are extremely powerful metaprogramming tools, but hopelessly unreadable. I am awed and jealous of anyone who can tame the hydra that is Lisp! As an end-user, types feel like the promised bicycle of the mind, but type-safe metaprogramming leaves much to be desired. Recent work seems to be addressing that, but type-level programming is an area of ongoing research and I'm not convinced it's ready for primetime yet. Recently I've been learning about logic programming, which seems like a suitable compromise for many domain-specific tasks. If you have any experience using Datalog/Prolog, I'd be curious to get your feedback about it. I've learned that logic programming and graph query languages share many similarities. Many techniques in logic programming can be evaluated using graph and lattice-based algorithms. One area I think Theano got right was incorporating a number of graph-based algebraic optimizations. There are few projects which have tried to organize a similar set of domain-specific rules including Rubi, Fungrim, Metamath et al. I'm sure there are more automated approaches, but it helps to have a database of common identities from which to start. I feel these abstractions could be collected and reused for compiler optimization and DSL generation. Another place where types arise is knowledge graphs, which seem to have a lot of promising applications for automated reasoning. I think it would be a nice idea to have a knowledge-based compiler for domain-specific applications such as matrices and random variable arithmetic. One technique which is apparently well-known in the compiler community is the idea of equality saturation. Users write down a bunch of equalities which the compiler can apply to optimize some desired heuristic (e.g. speed or numerical stability). This approach seems well-aligned with recent work on learning to rewrite. TL;DR: Kotlin∇ is on the back burner for a while. I need to make progress on my Ph.D., which grew out of ideas in this project, and I hope one day will return to plant more, but it will take some time for that to happen. |
Purpose
Introducing a secondary tensor operation DSL (Domain Specific Language) written in & optimised for Scala language & various compilers (the most common of which are JVM based scalac 2.12+, other options are Scala.js & Dotty)
Despite not being a dependently typed language, Scala may be favourable for some reasons:
vec1 dot_* vec2
can be easier to read thandotProduct(vec1, vec2)
singleton-ops
library and it’s associated discussions in SIP-23 have suggested that Scala 2.13 & Dotty will use a more direct & expressive approachUsage
A PoC project has been submitted at:
https://github.com/tribbloid/shapesafe/tree/master
At this moment (06/07/2020) it only implemented very few operations (dot_product, concat) for double vectors. However most features in the following list has been proved and can be showcased by running gradle test on
DoubleVectorSpec.scala
.Required & proven features:
singleton-ops
which is actively maintained by lightbend core team)Required & unproven:
shapeless.HList
)Nice to have:
shapeles.Nat
), which causes slow compilationNot in the scope:
How it fits into roadmap
The competition for supremacy of deep learning ecosystem is brutal and unforgiving. With torch & tensorflow dominating both research & development phase, people have little reasons to steer away from Python & a dynamically typed, procedurally validated scaffolding paradigm. But there are exceptions: the large scale, mission critical, complex systems in production, like autopilot and SLAM, most likely prefers spending much effort reinventing & maintaining a more verbose and arduous code base written in C++ or other low level languages. For these systems, demands for built-in correctness and predictability of execution far outweights the ability to write more concise code.
This is, IMHO, the market niche where kotlingrad can fit in: for mega-engineering rather than prototyping. In particular, to enable users to:
. in that order. My design & optimisation of DSLs should be consistent with this doctrine. The chosen of Scala & JVM as carrier should naturally put kotlingrad in the ecosystem of Apache Spark, and maybe of RISC-V on the long run, both of which are for large scale production.
The text was updated successfully, but these errors were encountered: