Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

applet.program.psoc1: implemented #672

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

purdeaandrei
Copy link
Contributor

@purdeaandrei purdeaandrei commented Aug 22, 2024

Here's a run of it:
image

Copy link
Member

@whitequark whitequark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks pretty interesting. This is a really large contribution (2KLOC!) so that means it'll take me a while to review.

The massive state machine written in not very idiomatic Amaranth scares me.

You will need to collect the documentation you used to implement this programing algorithm and add it to the documentation repository while linking it in the files where it is relevant to understanding.

Also, I would like to acquire a PSoC 1 developer board that can be used to test the applet, unless you're willing to commit to being responsible to maintaining the applet indefinitely (and even then, I'd rather have one around).

@purdeaandrei
Copy link
Contributor Author

The massive state machine written in not very idiomatic Amaranth scares me.

Would be glad to hear ideas of how to restructure it. It's the first piece of Amaranth code i've written. I definitely am impressed with Amaranth, implementing the same thing in verilog would have been a lot more painful.

The state machine is in the IDLE state when doing nothing. For some short commands (such as assert/deassert reset), it just changes the outputs, and goes back to IDLE. There are two long-lasting commands: CMD_SEND_BITS, which sends up to 64Kib from the FIFO, followed by an optional wait-and-poll, followed by an optional 40 zero-bits. And the other is CMD_READ_BYTE, which takes an address, a byte count, and reads those bytes, incrementing the address internally. Frankly I do see how it's actually more complicated than it needs to be. It's not like USB bandwidth is a bottleneck here. I think for reading bytes I could get rid of the auto-increment feature, and have the host send a bunch of read command+address pairs, and read the results of that all at once. Also the 40 zero-bits can be sent by the host manually, there's no reason to complicate the state machine with it. And my initial thoughts of why I did it that way is invalid (I was thinking to minimize host-to-glasgow communication which really isn't something that needs to be optimized for. The only thing that needs to be optimized is to minimize the amount of separate occasions the host waits for data before sending the next command. i.e. reading all bytes, with a single read byte command followed by waiting for it, and ontly after sending the next read-byte command would be a waste)

Also, I would like to acquire a PSoC 1 developer board that can be used to test the applet, unless you're willing to commit to being responsible to maintaining the applet indefinitely (and even then, I'd rather have one around).

These chips don't need any support components to be able to be programmed (except perhaps decoupling caps? but I bet you could get away even without those.) They just need 5v (or less), gnd, and 3 signal wires (xres, sdata, sclk). No regulators/reset monitors/oscillators/pull resistors, really nothing needed.

If you only need it for testing the applet, and don't want to experiment with other peripherals, then I suppose easiest solution would be, I could solder up these components I don't need with a pinout that would match the applet's default pinout, maybe add a led to be able to test a blinky app programmed onto it, and send it to you:
psoc1

Or you could do the same, I got this chip on the picture (CY824794), and QFN adapter off of ebay or aliexpress. The chips cannot be permanently security locked, bulk erase always works on them so it's safe to get them second hand.

@whitequark
Copy link
Member

I could solder up these components I don't need with a pinout that would match the applet's default pinout, maybe add a led to be able to test a blinky app programmed onto it, and send it to you:

That works for me. I have the skill to solder them but ever since my disability got worse I just don't have the time to do all of the things I used to have fun ;;

I definitely am impressed with Amaranth, implementing the same thing in verilog would have been a lot more painful.

For sure! I think you're actually not taking full advantage of the language (aside from some Verilog-isms I'm seeing that I didn't comment on in detail because it would take longer than I have available right now, we have higher level features like streams that I'm pretty sure would work well here).

Would be glad to hear ideas of how to restructure it [...]

I can throw some ideas your way but it might be a lot more productive for us to share a VS Code Live Share and a voice meeting. Are you available for that?

@whitequark
Copy link
Member

By the way, how was your Amaranth learning experience? Sadly we still don't have a tutorial, though I think the guide can pass for one if you already have HDL experience. People used to complain (I mean this in a neutral sense!) that learning Amaranth was hard while we had very little docs, but now almost nobody gives any kind of feedback, positive or negative, which is in some way even more difficult.

…he read_address register, just use byte as normal!
Only reference latest versions of each variant, and add
comments with accession info on top of the vectors.py file
@purdeaandrei
Copy link
Contributor Author

purdeaandrei commented Aug 23, 2024

I've simplified the state machine a bit.

  • the zero bits that are needed after polling are now sent by the host, rather then generated by the state machine
  • when reading bytes, the address generation now happens on the host
  • since the address doesn't need to be memorized while shifting it, because we don't need to increment it anymore, that saves another register, so we can load the address directly into byte

I also made a PR with the ISSP spec documents, and adjusted the commend references in the source code. Should I change my "#Rev K" comments to #G00090 ? (for the areas where I am referencing which vector and which chip is covered by which document).

What I considered but didn't do:
Right now to read 1 byte, one has to send two bytes (a command and an address)
I could further simplify the state machine, to not generate the 3-bit opcode and not generate the 3-bit tail end, but for it to be sent using CMD_SEND_BITS, which could also send the address, while CMD_READ_BYTE could take care only of the bus turnaround, and the shifting of the bits. I don't like this because it would mean that the host has to send 6 bytes every time it would want to read a single byte, and that feels like too much, just for the sake of simplifying the state machine.

I didn't use streams cause our conversation on the amaranth PR, sounded like it wasn't clear what the fifo stream interface will look like in the future, and I thought that it would be easier to just use the same simple fifo interface as other applets are using, cause it would make it easier to upgrade in the future when the fifo stream interface gets stabilized.

Another thing I want to note is that for the outputs I have used FFBuffers to keep them close to the pins with consistent timing.
That means that I cannot take the output of those flipflops and use them as inputs for my own logic. So what I've done is that there's a second shadow-flipflop for each of these IO-embedded flipflops, which can be used for inside-fpga logic.

And that's why you see me assigning outputs in the combinational domain like this:

m.d.comb += sdata_o_nxt.eq(1)
m.d.comb += sdata_oe_nxt.eq(1)

This way it allows me to control those signals from within the state machine, but without taking the penalty of 1 cycle of latency to the actual output. I find it easier to reason about my logic this way. If I were to accept the latency I feel like I'd have to keep more info in my head to understand how it works. But I feel like this is what you may be referring to as verilog-ism? This kindof forces me to use _nxt, because I can't use the same name for assigning new values into and for looking at the output of the flipflop, like I could with m.d.sync-assigned signals, and the _nxt makes it look verilog-ish...

There's another instance of me using _ff for edge-detection on sdata, but I though I was supposed to do that, because Rose() has been removed from Amaranth.

Maybe we can have a live chat in the weekend? Not sure what vscode live share is like, does codium support it? I mostly live in terminal-based text editors. But I could do VNC.

Regarding Amaranth learning experience:

  • At some point somehow I reached an old version of the documentation, without me realizing. This made me think that a bunch of stuff is not documented. I realize it's absolutely my fault, and later I realized and went back to reading the rest of the docs that I was missing, but perhaps the old doc websites could be edited to make the version number pop out better, and perhaps have a clear link there to the latest docs.
    So Instead of:
    image
    Do this?:
    image

  • The docs are referring to some very outdated tutorials: https://amaranth-lang.org/docs/amaranth/latest/tutorial.html I haven't looked at all of these, but for example I went looking at the graded exercises, and the friction I experienced was annoying. I.e. code examples not working, me needing to figure out how to fix it, what to replace it by, etc...Those tutorials are still valuable cause they cover things not in the official documentation, like formal verification, so I don't know what I suggest, I definitely wouldn't want to get rid of those links, Ideally it would be best to get outdated things updated in the tutorials. But I guess it takes time.

  • When it comes to the official documentation's quality I don't really have complaints. Perhaps some things I feel are missing would be some examples of how to use with specific toolchains, like the yosys with ice40, how to specify constraints, etc...? But maybe the info is there and I just didn't see it.

@whitequark
Copy link
Member

Not sure what vscode live share is like, does codium support it?

No, Live Share is for Microsoft-branded VS Code only.

  • At some point somehow I reached an old version of the documentation, without me realizing. [snip]

Ah, that's unfortunate. The problem is that "Go to Latest" is not desirable as an action item either! This is because "latest" can significantly differ from the latest release and that can also be confusing. "Latest release" is more tractable but I don't know how to do this using the Sphinx pipeline in a way that wouldn't be a headache to maintain. (Right now the old docs are not regenerated at all, they're treated as static HTML.)

  • The docs are referring to some very outdated tutorials:

I do this as basically harm reduction. At the time I wrote that page, I had no idea how to write... well, documentation in general, but tutorials in particular. Now I know how to write good documentation but I still haven't written a first-party tutorial. I guess I should at least put a prominent notice saying that the community tutorials are outdated, or perhaps remove them entirely.

  • I went looking at the graded exercises

Can you tell me more about what you expected to find? Do you mean "graded" in an academic sense? I don't really have an academic background (I never formally studied CS for example) so I'm not quite sure what do you mean by that.

  • Those tutorials are still valuable cause they cover things not in the official documentation, like formal verification

I'm not so sure about that; the formal verification experience with Amaranth is actually pretty poor, largely because of poor interfacing with Yosys (sby is not a great interface), and right now, while it's present, it's intentionally not documented or advertised as a feature. I think removing that link is probably fine, personally.

I don't think I can update the tutorials, they're not under my control, they were written by third parties some of which I've never even really talked to...

  • When it comes to the official documentation's quality I don't really have complaints.

Oh, nice!

Perhaps some things I feel are missing would be some examples of how to use with specific toolchains, like the yosys with ice40, how to specify constraints, etc...? But maybe the info is there and I just didn't see it.

It's not. The platform layer is one of the last remaining things that aren't documented well. It's on the TODO list, though I don't have an ETA.

Thanks for such a detailed answer.

@whitequark
Copy link
Member

I didn't use streams cause our conversation on the amaranth PR, sounded like it wasn't clear what the fifo stream interface will look like in the future, and I thought that it would be easier to just use the same simple fifo interface as other applets are using, cause it would make it easier to upgrade in the future when the fifo stream interface gets stabilized.

Quick note on this: the Glasgow FIFOs are definitely going to offer streams in a way similar to the current implementation, and since Glasgow doesn't offer any kind of backwards compatibility for dependent code, if I decide to change it after all, I can do it in a single PR. I would say that all new applets should use the stream interface and not the old FIFO style interface. (Perhaps we should do a deprecation cycle on the latter.)

The Amaranth FIFOs will have backward compatibility guarantees, and will involve an RFC. Until the RFC is merged I can't say what exactly the interface will be.

@whitequark
Copy link
Member

Another thing I want to note is that for the outputs I have used FFBuffers to keep them close to the pins with consistent timing.
That means that I cannot take the output of those flipflops and use them as inputs for my own logic.

If you use streams and IOStreamer this problem is solved, as the latency of the FFBuffer is handled by the IOStreamer primitive. All you need to do is to register, while sending an output bit, whether you need an input bit back, and to use streams to implement your logic.

@purdeaandrei
Copy link
Contributor Author

Can you tell me more about what you expected to find? Do you mean "graded" in an academic sense? I don't really have an academic background (I never formally studied CS for example) so I'm not quite sure what do you mean by that.

Apologies, I meant specifically this tutorial: https://github.com/robertbaruch/amaranth-exercises which says "Graded exercises" in its title. I also don't know why they're called graded. sent a minor PR there, and it was merged in less than a week, so the repository is alive: RobertBaruch/amaranth-exercises#3 However there are other issues too, that were not so obvious to solve and needed some experimentation and I didn't have the energy. Specifically for example the lack of Past() and Rose(), I could certainly do an _ff signal, but I am not sure how that interacts with formal verification. It was also my first time actually playing with formal verification, so parts of the tutorial I could go through were interesting, but I may not be qualified to fix the things that don't work.

@whitequark
Copy link
Member

Specifically for example the lack of Past() and Rose(), I could certainly do an _ff signal, but I am not sure how that interacts with formal verification.

The reason Past() was removed is that the behavior of it under reset was ambiguous and poorly defined. By using an _ff signal (with explicit reset-less behavior, or not) you rectify that, and also gain some other things, like being able to use arbitrary expressions (which weren't supported by Past()).

…of sending a bitstream, if we think we might be right after a power cycle.
@purdeaandrei
Copy link
Contributor Author

I've switched to using the fifo stream interface. Also I removed some unnecessary combinatorial feedback from the fifo signals, and it should now also follow stream rules.
I considered switching to IOStreamer, however I see a few issues that make it seem not so great for my usecase:

There doesn't seem to be a way to only update a subset of the ports. So then I'd need 3 separate IOStreamers, which would feel wrong, as I would fear the risk of them desynchronizing. Well, it won't really happen since I probably won't have any backpressure on the i_streams but still, it feels wrong to split up the IOstreamers. Or I would need to keep the previous value of non-updated ports in flops, which I'm already doing, but then the o_latch in IOStreamer would be a waste.

Sometimes I want to use the previous value for more than just keeping certain ports unchanged. Like for example in the case of the xres signal, I want to make a decision in the state machine based on whether it was asserted or not. And in the case of sclk I want to know in which half ot the clock cycle I was, so I can restart the timer to count the low side of the clock automatically.

So it seems like in all of these cases I have to maintain my own flopped versions of these signals anyway, and IOStreamer doesn't help me with that. It also doesn't help me with sampling the data signal, at least not the way I have it now, because I was paranoid and decided to sample the data signal after synchronizing it with an FFSynchronizer. The sdata signal can be non-synchronous, but to be fair it should only ever be non-synchronous when we're doing polling. When we're shifting bits from the device, it's expected to be synchronous, so to be fair, I could use the non-FF-synchronized version of the signal for shifting, and the synhronized signal for polling, and yeah, in that case I could use IOstreamer to abstract away the details of when the FFBuffer samples the input. But still using IOStreamer in this case feels a bit less intuitive to me.

@whitequark
Copy link
Member

So the issue here is that the bare state machine (and the ancillary stuff around it) is extremely complex and I don't know if I'll have time to review it anytime in the next 3 to 9 months.

@purdeaandrei
Copy link
Contributor Author

purdeaandrei commented Aug 25, 2024

I made some more changes to improve readability:

  • I factored out the _nxt / _ff verilogisms into a class PinDriver, to avoid code duplication and to separate concerns. Should be more readable this way. It's not a component, it's just a way to group code for readability.
  • I used amaranth structlayout and unionlayout, to more readably describe the command byte. Unfortunately when I tried to build the command byte using this feature, so I don't have to hard-code bit offsets on the python side (see here: purdeaandrei@ff83059 ) my code became twice as slow. Looks like contructing a Const() object for a single command byte takes 0.14 milliseconds! So I fell back to hardcoding the offsets on the python side, but I added some asserts that check the offsets are correct, so if in the future someone changes the data structure, the assert will alert them to update the python side too.

Not sure if it can help with the review, cause it ended up a bit ugly, but I also made a mermaid chart for the SEND_BITS flow:
here (it only covers reset-mode operation when an XRES pin is available)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants