Cloud-init bi-weekly status

Posted on Mon 24 June 2019 in status-meeting-minutes • 10 min read

Meeting information

Meeting summary

Previous Actions

The discussion about "Previous Actions" started at 16:23.

Recent Changes

The discussion about "Recent Changes" started at 16:28.

In Progress Development

The discussion about "In Progress Development" started at 16:31.

Vote results

Action items, by person

Done items

  • (none)

People present (lines said)

  • blackboxsw (73)
  • rharper (23)
  • AnhVoMSFT (18)
  • nik736 (12)
  • ubot5 (4)
  • meetingology (4)

Full Log

16:18 <blackboxsw> #startmeeting Cloud-init bi-weekly status

16:18 <meetingology> Meeting started Mon Jun 24 16:18:34 2019 UTC. The chair is blackboxsw. Information about MeetBot at http://wiki.ubuntu.com/meetingology.

16:18 <meetingology>

16:18 <meetingology> Available commands: action commands idea info link nick

16:19 <blackboxsw> welcome to another episode of cloud-init status updates.

16:20 <blackboxsw> Cloud-init upstream uses this meeting as a platform for community updates, feature/bug discussions, and an opportunity to get some extra input on current development.

16:21 <blackboxsw> our format is the following topics: Previous Actions, Recent Changes, In-progress Development, Office Hours

16:21 <blackboxsw> anyone is welcome to participate, interject, make suggestions or ask questions

16:22 <blackboxsw> generally we try to host this meeting every two weeks on the day listed in the channel topic

16:23 <blackboxsw> #topic Previous Actions

16:23 <blackboxsw> last meeting

16:23 <blackboxsw> #link https://cloud-init.github.io/status-2019-06-10.html#status-2019-06-10

16:24 <blackboxsw> we had an action to follow up on any bugs related to installing ifupdown on a system that had netplan installed by default.

16:24 <blackboxsw> I believe we did see a bug come in from Azure about that.... checking for that bug id now

16:25 <blackboxsw> #1832381

16:25 <rharper> bug #1832381

16:25 <ubot5> bug 1832381 in cloud-init (Ubuntu) "vm fails to boot due to conflicting network configuration when user switches from netplan to eni" [Undecided,Incomplete] https://launchpad.net/bugs/1832381

16:25 <blackboxsw> #link https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381

16:25 <AnhVoMSFT> There is an action item on me to attach a log to that bug. Since the incident created by the customer was closed and we did not have permission to share his log, I will need to get a repro and retrieve the log. It's not very easy to trigger a mac address change in Azure these days

16:25 <blackboxsw> thanks AnhVoMSFT for this bug

16:27 <blackboxsw> ok if we carry over that action item then for next status meeting AnhVoMSFT (just to close the loop if it's important)

16:27 <AnhVoMSFT> yep - once I get some help from our networking folks to trigger a mac address change I'll update the bug with more logs

16:27 <blackboxsw> #action Touch base with AnhVoMSFT by next status on priority of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381

16:27 * meetingology Touch base with AnhVoMSFT by next status on priority of https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1832381

16:27 <ubot5> Ubuntu bug 1832381 in cloud-init (Ubuntu) "vm fails to boot due to conflicting network configuration when user switches from netplan to eni" [Undecided,Incomplete]

16:28 <blackboxsw> good deal. that's all we had for actions from last meeting

16:28 <blackboxsw> #topic Recent Changes

16:29 <blackboxsw> the following items have landed on tip of cloud-init's master branch

16:30 <blackboxsw> - sysconfig: support more bonding options [Penghui Liao]

16:30 <blackboxsw> - cloud-init-generator: use libexec path to ds-identify on redhat systems

16:30 <blackboxsw> [Ryan Harper] (LP: #1833264)

16:30 <blackboxsw> - tools/build-on-freebsd: update to python3 [Gonéri Le Bouder]

16:30 <ubot5> Ubuntu bug 1833264 in cloud-init "cloud-init-generator hardcodes path to ds-identify" [Undecided,Fix committed]

16:30 <blackboxsw> thanks to Penghui and Gonéri for driving additional changes for cloud-init in this last sessions

16:30 <blackboxsw> session*

16:31 <blackboxsw> #topic In Progress Development

16:32 <blackboxsw> there a number of longer items for feature work in progress that should see some light soon

16:33 <blackboxsw> We track these features in trello as always

16:33 <blackboxsw> #link https://trello.com/b/hFtWKUn3/daily-cloud-init-curtin

16:33 <blackboxsw> minor fixup for Azure instance-data.json (cloud-init query) for region and availability zone should land today

16:34 <blackboxsw> rharper: and blackboxsw are working on Azure-related route tables and async disk mount features

16:36 <AnhVoMSFT> is there any bug/discussion item for the async disk mount?

16:37 <blackboxsw> AnhVoMSFT: rharper has been testing out systemd unit magic for setting up disk mounts async and initial numbers look good. How to bake that work into cloud-init is the next small hurdle I think. (I thought he mentioned today in our standup a 50% speed increase due to async mounts instead of sync waits)

16:38 <rharper> https://code.launchpad.net/~raharper/cloud-init/+git/cloud-init/+ref/feature/disk_setup_async is the WIP branch

16:39 <blackboxsw> AnhVoMSFT: I expect we'll have something in the next couple of days.

16:39 <blackboxsw> orrr right now. thanx rharper

16:39 <AnhVoMSFT> that sounds really cool. I'll check it out

16:39 * blackboxsw creates a trello card that can be watched for this feature

16:40 <blackboxsw> #link https://trello.com/c/TMK5ZDMf/1108-azure-async-disk-mounts

16:41 <blackboxsw> feel free to subscribe to any trello cards folks see that are of interest. you will get an email if the card changes state, like from Doing to Done or if new links are added

16:42 <blackboxsw> Odd_Bloke: rharper process question

16:42 <blackboxsw> what do you guys think about us turning on voting on trello cards

16:42 <blackboxsw> people with interest on a feature/card in our backlog could upvote it and that could help drive what features we grab over time

16:43 <blackboxsw> dunno, thought it might be something we could toss around to see if that would make sense. the board it public after all

16:43 <blackboxsw> is public rather

16:43 <rharper> maybe; I worry about random +1 without any more context. Platform developers already work with us; and community folks file bugs/merge proposals

16:44 <blackboxsw> good point.

16:44 <rharper> I'm open to the idea

16:45 <blackboxsw> for sure, if it gets interest, we can think about adding that feature. can't hurt to have some additional input, unfounded though it may be.

16:45 <AnhVoMSFT> agreed on the usefulness might be limited. You guys are already talking to each other. Platform developers either engage directly on this board or through out of band channel (sync meeting with Canonical product groups, etc...)

16:46 <AnhVoMSFT> Perhaps you can try it out for a couple release periods and see how it works out

16:46 <blackboxsw> yeah, /me just likes all the shiny objects pretty icons ;) ... need to control myself

16:47 <blackboxsw> thx AnhVoMSFT +1.

16:48 <blackboxsw> so I think that about wraps in-progress development. I know paride has been tirelessly working on our CI infrastructure to improve quality of CI and false positives for failures due to resource constraints. So big thanks for paride working on our jenkins workers

16:48 <blackboxsw> #topic Office Hours (next ~30 mins)

16:49 <blackboxsw> This is an open topic to bring any cloud-init discussions, bugs, concerns or feature requests folks have.

16:49 <blackboxsw> In the absence of such topics we spend part of this time grooming the review queue to get back to dev

16:50 <blackboxsw> contributors so that they don't have stale branches waiting for input

16:50 <AnhVoMSFT> We had a review sent out to add some boot time telemetry collection as part of cloud-init analyze: https://code.launchpad.net/~samgilson/cloud-init/+git/cloud-init/+merge/368943

16:50 <blackboxsw> thanks AnhVoMSFT I'll grab a review slot on that one now

16:51 <AnhVoMSFT> would appreciate some reviews there and also on ideas on how to retrieve similar timestamps for FreeBSD

16:51 <rharper> AnhVoMSFT: yes, will review

16:51 <rharper> AnhVoMSFT: also, I filed a bug related to the azure telemetry, lemme get it

16:51 <blackboxsw> I'll kick off a CI run on that now

16:51 <blackboxsw> rharper: ^

16:52 <rharper> Bug 1833731

16:52 <ubot5> bug 1833731 in cloud-init "cloud-init analyze output not formatted cleanly on Azure" [Undecided,New] https://launchpad.net/bugs/1833731

16:52 <rharper> AnhVoMSFT: not sure if the branch for review addresses the formatting of the output, but we should take a look to clean it up

16:52 <AnhVoMSFT> is there a good way to subscribe to new bugs with a certain keyword/tags? I.e., I would like to auto-subscribe to all bugs that has "Azure" in the bug title

16:52 <blackboxsw> rharper: if you get a chance to double check https://code.launchpad.net/~chad.smith/cloud-init/+git/cloud-init/+merge/369199 we might be able to land that too

16:53 <rharper> blackboxsw: I asked you some questions, if you've replied, I'll look again

16:53 <blackboxsw> rharper: nevermind, I see you already looked at it

16:53 <rharper> ah

16:53 <rharper> perfect

16:53 <blackboxsw> thanks

16:53 <rharper> I think we're mostly fine; just a question on return values

16:53 * blackboxsw needed to refresh

16:54 <AnhVoMSFT> rharper I will take a look at the analyze output and see how we can improve it. If it is a minor change we can add it to the existing review

16:54 <rharper> AnhVoMSFT: no need to pull it into the existing stuff

16:55 <rharper> I'd prefer a separate targetted fix; which may land independently from the boot stage (which is super interesting on its own)

16:55 <AnhVoMSFT> cool - we will do a separate fix then

16:55 <blackboxsw> rharper: yeah that concern is clear, we are safe on the processing of region/az in absence of 'imds' key

16:55 <blackboxsw> it'll return none

16:57 <blackboxsw> by virtue of that last get('location|platformFaultDomain')

16:57 <blackboxsw> if either is absent due to any key above being absent, you'll get None as default value

16:58 <rharper> blackboxsw: ack

17:00 <nik736> Hi there, not sure if this is the right place to ask, but I have problems when creating a new VM, it only happens with the debian cloud image, ubuntu is fine. Booting is stuck at the drm line, the exact line is dependent on the video model type in my libvirt xml but it is basically stuck for 20-30sec and won't continue. It will boot eventually after that time. Thanks so much for any hints. Happy to provide

17:00 <nik736> further details.

17:02 <blackboxsw> hrm, video model timeouts are a bit out of my wheelhouse :/

17:02 * blackboxsw pokes around a bit in google

17:02 <nik736> it seems to be that the lines after it would be about resizing the file system. I am not really sure if this is cloud-init related at all and I am not sure if it actually is caused by the video model or is just taking a bit to get to the next steps

17:03 <blackboxsw> nik736: you can run cloud-init analyze show or cloud-init analyze blame to see what cloud-init says it is spending a lot of time on

17:03 <nik736> I tried different host systems, Debian 18.04, 19.04, Debian 9, different libvirt versions, different qemu versions, nothing seems to be helping lol

17:03 <blackboxsw> (If you have cloud-init v 18 or later in your image I think)

17:04 <nik736> ah, ok, thanks, I will look into that

17:04 <blackboxsw> nik736: also systemd-analyze blame is a good helper for what is killing boot time

17:04 <AnhVoMSFT> do you see any timestamp gap that reflects the 20-30s in cloud-init.log ?

17:08 <rharper> nik736: feel free to file a bug and attach logs from the 'cloud-init collect-logs' output (or serial console if available) and /var/log/cloud-init.log if you can get into the instance afterwards

17:08 <nik736> thanks for the help, currently looking into it

17:11 <AnhVoMSFT> rharper blackboxsw we have some instance deployment where cloud-init is hanging at the command ip route add - any idea how to look further?

17:12 <AnhVoMSFT> this does look like a platform problem, so it is more of a question related to networking, rather than cloud-init itself

17:14 <AnhVoMSFT> it's super hard to reproduce so the only thing we have so far to work with is logs. I thought the call to ip route add basically adds an entry to the kernel routing table. Is there an interaction with networking involved which might cause it to hang?

17:14 <rharper> AnhVoMSFT: I wonder if it's creating a route that breaks connection to IMDS or something else that cloud-init would then do an HTTP get on ?

17:17 <nik736> I saw in the log that 2 entries are around 1 minute apart "SUCCESS: searching for local datasources" and "Cloud-init v. 0.7.9 running 'init' at Mon, 24 Jun 2019 17:13:41 +0000. Up 73.67 seconds." I am not sure if this could be it or if this looks fine

17:18 <rharper> 0.7.9 is quite old, seeing the full cloud-init.log will be most useful for us to understand what's happening

17:19 <nik736> okay, sec

17:19 <AnhVoMSFT> rharper that is a good theory. I do see in a good case there's a call to IMDS immediately after that, although that call has a timeout. If it fails we should see more logs coming out of cloud-init. I'll look further into that todya

17:20 <nik736> @rharper https://pastebin.com/fzCSH5kC

17:20 <rharper> AnhVoMSFT: the retry logic in DataSourceAzure is quite long IIRC, so it's quite possible this is the very issue that blackboxsw is working w.r.t ensuring the instances always have a source-ip route to the IMDS

17:21 <AnhVoMSFT> rharper indeed it is long, and the log was overly suppressed to avoid log from growing too large while VM was waiting in pre-provisioning state. We are adding back some of the logs (in a smarter way to get enough details while avoiding huge log size)

17:22 <rharper> nik736: so, between line 260 and 261 there's a large timedelta; that's outside of cloud-init; cloud-init is executed separately 4 times (cloud-init init --local, cloud-init init, cloud-init config --modules, cloud-init config --final)

17:23 <rharper> nik736: so if you have a systemd journal, we could see what happens between the end of cloud-init-local.service and cloud-init.service (stage1 and 2);

17:23 <nik736> ah, okay, interesting

17:23 <nik736> will check

17:23 <rharper> or syslog might see stuff between those two time points

17:23 * rharper steps away for a bit, please keep sending info here; I'll respond when I;m back

17:24 <AnhVoMSFT> nik736 systemd-analyze critical-chain cloud-init.service might help here - I think some systemd service is running right after init-local and just before init and that service is taking time

17:25 <nik736> will check, thanks for your help, really appreciate it.

17:32 <blackboxsw> I think I'll wrap the meeting here, but we can continue the conversation. Thanks again folks for the discussions

17:33 <blackboxsw> next meeting will be July 8th

17:33 <blackboxsw> as updated in the topic

17:33 <blackboxsw> meeting minutes will be posted to

17:33 <blackboxsw> #link https://cloud-init.github.io

17:33 <blackboxsw> #endmeeting

Generated by MeetBot 0.1.5 (http://wiki.ubuntu.com/meetingology)