Cloud-init bi-weekly status

Posted on Mon 10 June 2019 in status-meeting-minutes • 11 min read

Meeting information

Meeting summary

LINK: https://cloud-init.github.io

Previous Actions

The discussion about "Previous Actions" started at 16:23.

Recent Changes

The discussion about "Recent Changes" started at 16:24.

In Progress Development

The discussion about "In Progress Development" started at 16:30.

Office Hours

The discussion about "Office Hours" started at 16:45.

Office Hours (next ~30 mins)

The discussion about "Office Hours (next ~30 mins)" started at 16:48.

Vote results

Done items

  • (none)

People present (lines said)

  • blackboxsw (39)
  • rharper (39)
  • AnhVoMSFT (29)
  • cyphermox (12)
  • robjo (6)
  • meetingology (4)
  • ubot5 (3)
  • paride (1)
  • Odd_Bloke (1)

Full Log

16:19 <blackboxsw> #startmeeting Cloud-init bi-weekly status

16:19 <meetingology> Meeting started Mon Jun 10 16:19:45 2019 UTC. The chair is blackboxsw. Information about MeetBot at http://wiki.ubuntu.com/meetingology.

16:19 <meetingology>

16:19 <meetingology> Available commands: action commands idea info link nick

16:19 <rharper> o/

16:20 <Odd_Bloke> o/

16:20 <blackboxsw> hi cloud-init folks. let's kick off the bi-weekly meeting again

16:21 <blackboxsw> our last meeting minutes are hosted on github

16:21 <blackboxsw> #link https://cloud-init.github.io

16:22 <blackboxsw> welcome all. Generally cloud-init upstream uses this meeting to provide a platform for status updates, raising questions or concerns and feature discussion. All are encouraged to participate as you see fit.

16:22 <blackboxsw> our format is the following topics: Previous Actions, Recent Changes, In-progress Development, Office Hours

16:23 <blackboxsw> interjections and additional topics are welcome

16:23 <blackboxsw> #topic Previous Actions

16:24 <blackboxsw> Checking last meeting's minutes we were clear of old actions.

16:24 <blackboxsw> so we'll jump to the next topic this week.

16:24 <blackboxsw> #topic Recent Changes

16:26 <blackboxsw> the following commits landedd in cloud-init tip since the last status meeting

16:26 <blackboxsw> - Allow identification of OpenStack by Asset Tag

16:26 <blackboxsw> [Mark T. Voelker] (LP: #1669875)

16:26 <blackboxsw> - Fix spelling error making 'an Ubuntu' consistent. [Brian Murray]

16:26 <blackboxsw> - run-container: centos: comment out the repo mirrorlist [Paride Legovini]

16:26 <blackboxsw> - netplan: update netplan key mappings for gratuitous-arp

16:26 <blackboxsw> [Ryan Harper] (LP: #1827238)

16:26 <ubot5> Launchpad bug 1669875 in OpenStack Compute (nova) "identify openstack vmware platform" [Wishlist,Confirmed]

16:26 <ubot5> Launchpad bug 1827238 in cloud-init "Machines fail to deploy because cloud-init needs to accept both netplan spellings for grat arp" [Medium,Fix committed]

16:30 <blackboxsw> I was poking around out trello board to see if we've moved other cloud-init related content into the done lane, but I think those commits about capture the recent work

16:30 <blackboxsw> #link https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin

16:30 <blackboxsw> #topic In Progress Development

16:31 <blackboxsw> our active reviews are located here (as mentioned in the topic)

16:31 <blackboxsw> #link https://code.launchpad.net/cloud-init/+activereviews

16:32 <blackboxsw> Goneri: thanks for all the work on freebsd branches, there has been some good momentum there

16:32 <blackboxsw> there is ongoing work from Azure datasource that will likely land in the next week or two

16:33 <paride> ^^ "run-container: centos: comment out the repo mirrorlist", only actually relevent when using an http/https proxy, in all the other cases the mirrorlist works as usual

16:33 <blackboxsw> and some network-related changes landing shortly

16:33 <blackboxsw> paride: thank you paride for the extra note

16:33 <AnhVoMSFT> blackboxsw can you share more details on the work from Azure datasource ? Any bug that we can reference?

16:33 <blackboxsw> I was thinking https://code.launchpad.net/~jasonzio/cloud-init/+git/cloud-init/+merge/364012 AnhVoMSFT

16:35 <rharper> related to sorting out covering the all the network related scenarios so that we configure network in a way that ensures access to IMDS and internet in the face of additional static ips on the same subnet as the primary interface, multiple dhcp interfaces with default routes,

16:35 <AnhVoMSFT> I see - I think there potentially needs some bigger change there, as there was some issue around identifying the primary/secondary NIC. We got confirmation from our netwoking team that the first NIC returned is the primary

16:35 <rharper> AnhVoMSFT: good to know; that was our observation

16:36 <rharper> AnhVoMSFT: https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1815254 , related as well; the plan being to put in place some source-based routing;

16:36 <ubot5> Launchpad bug 1815254 in cloud-init (Ubuntu) "Azure multiple ips prevent access to metadata service" [Undecided,Confirmed]

16:38 <AnhVoMSFT> thanks rharper - is that something that should be changed/fixed from cloudinit, or is this more platform related?

16:38 <rharper> that's a good question; generally it would be great if a platform were to include source-routes and metrics in the config they send

16:38 <AnhVoMSFT> if the latter I will file a workitem on our side to go do some research and get the right team to take a look at it

16:39 <rharper> currently no cloud does this, rather some indicate a primary via metadata, and then the OS scripts apply a metric to all non-primary routes to ensure that default routes go to the primary

16:39 <AnhVoMSFT> I see - so I guess we can do similarly on Azure since we know what the primary is (first nic returned in IMDS)

16:40 <rharper> AnhVoMSFT: so in the short term, I think cloud-init should (where possible with the OS network config) provide additional tuning (likely post-scripts in some cases) to tune the routing for what cloud-init knows is the primary route

16:40 <rharper> AnhVoMSFT: yes, I prefer a primary=True or whatever, but it's good enough to have the current behavior documented (in the code)

16:40 <AnhVoMSFT> thanks rharper

16:40 <rharper> so if it change/breaks, then we know

16:44 <rharper> I think that covers our in-progress items for the moment

16:45 <rharper> not sure if the bot will listen to me, but just in case

16:45 <robjo> Be mindful that in Azure the metadata service may lag behind by minutes w.r.t. secondary IPs on an interface

16:45 <rharper> #topic Office Hours

16:45 <rharper> robjo: in general, my awareness is that the instance has to be off line to change vnets and such; and booting back up has been enough time to see IMDS updated, do you see differently ?

16:46 <AnhVoMSFT> robjo that is good to know, I will check on that

16:46 <robjo> We've had various issues with cloud-netconfig due to the metadata server in Azure being slow and reverted to polling, which of course got us in trouble with API rate limits

16:46 <rharper> robjo: interesting

16:47 <rharper> We'll here in channel so if youve;; got merges or bugs that need an eye or just questions, fire away

16:47 <AnhVoMSFT> robjo feel free to file a bug on that and we will investigate - IMDS is our partner team so we'll get some answer quickly there

16:48 <AnhVoMSFT> rharper, a couple things I want to ask for Office Hours

16:48 <robjo> AnhVoMSFT: We have been working with Stephen Zarkos on the issues

16:48 <blackboxsw> #topic Office Hours (next ~30 mins)

16:48 <AnhVoMSFT> robjo I will ping Stephen and get more detail and see if we have any follow up items

16:48 <blackboxsw> sorry folks got pulled away for a bit thx rharper

16:48 <robjo> And double checked that the polling direction was OK form the Microsoft perspective before we implemented that

16:49 <AnhVoMSFT> I see, glad you're not blocked on it

16:50 <robjo> rharper: We always had bug reports that upon reboot not everything was always configured when secondary IP addresses were in play. But theoretically yes upon reboot everything should be there

16:50 <AnhVoMSFT> rharper we have a customer who booted up a VM based on 18.04, which uses netplan. Cloudinit wrote a netplan file to the image. He then installed ifupdown, then had some networking change which triggered a mac address change. Upon rebooting, cloudinit tries to use eni, but netplan file was still there, which caused his VM to mess up the network config

16:50 <robjo> putting cloud-netconfig into polling mode pretty mush addresses the issues we had reports about

16:51 <rharper> AnhVoMSFT: yes; that sounds very likely

16:51 <rharper> AnhVoMSFT: did they file a bug?

16:51 <rharper> cloud-init net "detects" which service is present

16:51 <AnhVoMSFT> I'm checking to see if this should be a bug, or that is expected behavior

16:51 <rharper> so if they did not uninstall netplan.io then cloud-init will likely prefer that over eni

16:52 <AnhVoMSFT> cloudinit actually prefers eni if ifupdown is installed, I think

16:52 <rharper> AnhVoMSFT: so the etc/netplan/*.yaml would only trigger things if netplan is still present; the systemd-generator will read yaml and write out networkd files

16:53 <AnhVoMSFT> right, I think the customer's mistake was to not uninstall netplan (or remove any netplan configuration file) after installing ifupdown

16:53 <rharper> AnhVoMSFT: right; I think we'll need to see the log and system state, but it sounds like an incomplete uninstall of netplan

16:53 <rharper> uninstall of netplan should be enough to make the cloud-init.yaml inert

16:54 <rharper> https://netplan.io/faq#how-to-go-back-to-ifupdown

16:54 <rharper> AnhVoMSFT: it should have automatically uninstall netplan.io

16:54 <AnhVoMSFT> I'm not sure if there is much we can do from the cloudinit side - perhaps if choosing eni, disable the cloud-init netplan yaml

16:54 <rharper> AnhVoMSFT: well, we could check writable paths of the renderers

16:54 <AnhVoMSFT> rharper I don't think that is the behavior on 18.04 - installing ifupdown will not uninstall netplan

16:55 <rharper> AnhVoMSFT: you're right; =(

16:55 <rharper> that sort of feels like a bug in the packaging

16:55 <AnhVoMSFT> yes, I share the same sentiment

16:56 <AnhVoMSFT> I will go ahead and file a bug so even if we don't have a short term action we can still capture the discussion

16:57 <rharper> AnhVoMSFT: thanks, I'm pinging in #netplan and the bug will be great so we can figure out the right plan

16:59 <AnhVoMSFT> second question: We have an intern working in our team and as part of warming up in cloudinit he wrote some additional capabilities into cloud-init analyze, adding a "boot" module (in addition to show/blame/dump), which collects timestamps of phases happening during vm booting up, but before cloudinit started, such as kernel initialization, systemd initialization..

17:00 <AnhVoMSFT> this should work for all cloud (he tested in AWE/GCP). Currently only works for distros that uses systemd. He'll try to figure out how to get those counters for freebsd and others

17:00 <AnhVoMSFT> rharper since you were the original author of analyze, I'm trying to gauge the interest on this and we're open to suggestions/questions

17:01 <cyphermox> rharper: they can coexist and configure each their own interface, so it's not a conflict. It's no different than coexisting ifupdown and NetworkManager, or also NetworkManager and systemd-networkd

17:01 <rharper> AnhVoMSFT: that sounds excellent

17:01 <blackboxsw> nice AnhVoMSFT on the commandline extensions!

17:01 <rharper> AnhVoMSFT: happy to review branch or Work-in-Progress when it's available

17:02 <AnhVoMSFT> thanks rharper blackboxsw we will have that in a branch very soon.

17:03 <AnhVoMSFT> cyphermox if that is the case then either the customer or cloudinit needs to make sure the system does not have conflicting configuration for netplan/eni.

17:03 <rharper> cyphermox: ok; would you be open to some sort of warning about having config in both or something? I dunno; it's just not a great experience to add the new package, configure it, reboot and not have networking since the same interface was configured (differenlty) in both packages

17:03 <blackboxsw> yeah, I'm quite intterested in any additional cli functionality that cloud-init more versatile as a system debug tool

17:04 <blackboxsw> makes cloud-init more versatile

17:04 <cyphermox> rharper: I'm not opposed to a warning, but that's not necessarily better UX.

17:05 <cyphermox> debconf prompts are quite annoying to have at upgrade, and just writing it out people are likely to miss it altogether

17:05 <cyphermox> (so you wouldn't really gain much)

17:05 <AnhVoMSFT> blackboxsw yep that was the goal - we want to be able to deploy 1000 VMs, then use cloud-init analyze output to analyze the 50th/99th percentile of where the timing was spent during system boot, and we need some more insights into phases before cloud-init started as well

17:05 <rharper> cyphermox: agreed; having a pointer to suggest cleaning/checking/confirming configs if /etc/netplan/ is non-empty and netplan.io is installed

17:06 <cyphermox> rharper: one option is to parse enough of /etc/network/ to catch mentions of the interface, but that's not necessarily super solid (though it's the best option), because people can rename interfaces in netplan and match by mac

17:06 <rharper> might be helpful; though I agree that they may still ignore that; and cloud-init could do some more work to see if an image has multiple renderers available and ensure it didn't leave config for a previous boot around

17:07 <rharper> cyphermox: yeah; cloud-init knows more about the config and both formats; we're likely in a better spot to see "you've configured this interface twice"

17:08 <cyphermox> rharper: so in short, I'm not opposed to improving the UX, but I'm not wowed by any solution right now (even mine)

17:09 <rharper> cyphermox: that's fair; thanks

17:09 <AnhVoMSFT> i think a fix in cloudinit might make most stakeholders happy here. It knows which configuration file it wrote, so it can definitely look for conflicting configurations

17:09 <rharper> cyphermox: AnhVoMSFT is going to file the customer bug with details and we can discuss what (if any) improvements are to be made; I suspect cloud-init can help most here

17:09 <cyphermox> yes, I think so too

17:09 <rharper> cyphermox: thanks for the input

17:09 <AnhVoMSFT> it can't be responsible for everything the customer does though. If customer writes some my-own-netplan.yml, we can't help much

17:10 <cyphermox> rharper: but hey, if someone was to write a check when running netplan apply that there exists config in /etc/network, I wouldn't have much issues merging it

17:10 <rharper> AnhVoMSFT: right, we have several "maybe_delete_if" where we verify expected output before we remove things

17:10 <cyphermox> I just know I won't have time to look into this myself in the near future

17:10 <rharper> cyphermox: ack

17:11 <cyphermox> I think what will help most is aggressively deprecating and removing ifupdown

17:13 <cyphermox> that said, the best we can realistically do for the time being is to demote it to universe

17:13 <cyphermox> (and that's not going to change anything for UX)

17:15 <AnhVoMSFT> we had another instance of someone installing ifupdown2, which had the effect of removing cloud-init on debian/ubuntu 16.04

17:16 <AnhVoMSFT> and totally hosed his system, but that's a different issue altogether

17:26 <blackboxsw> s

17:27 <blackboxsw> thanks for the good discussion folks, I guess we'll just add an action item to followup on a netplan bug for next time to see where we are at

17:31 <blackboxsw> #action follow up any bugs related to Azure/netplan uninstall in favor ifupdown to see if cloud-init has actionable feature work to ensure proper network renderer is used

17:31 * meetingology follow up any bugs related to Azure/netplan uninstall in favor ifupdown to see if cloud-init has actionable feature work to ensure proper network renderer is used

17:31 <blackboxsw> ok, I'll post minutes on this. thank you again rharper for driving

17:31 <blackboxsw> and for the participation robjo cyphermox and AnhVoMSFT

17:31 <blackboxsw> #endmeeting

Generated by MeetBot 0.1.5 (http://wiki.ubuntu.com/meetingology)